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ABSTRACT 

We present a multi-scale, multi-wavelength source extraction algorithm called getsources. Although it has been designed primarily 
for use in the far-infrared surveys of Galactic star-forming regions with Herschel, the method can be applied to many other astronom- 
ical images. Instead of the traditional approach of extracting sources in the observed images, the new method analyzes fine spatial 
decompositions of original images across a wide range of scales and across all wavebands. It cleans those single-scale images of 
noise and background, and constructs wavelength-independent single-scale detection images that preserve information in both spatial 
and wavelength dimensions. Sources are detected in the combined detection images by following the evolution of their segmentation 
masks across all spatial scales. Measurements of the source properties are done in the original background- subtracted images at each 
wavelength; the background is estimated by interpolation under the source footprints and overlapping sources are deblended in an 
iterative procedure. In addition to the main catalog of sources, various catalogs and images are produced that aid scientific exploitation 
of the extraction results. We illustrate the performance of getsources on Herschel images by extracting sources in sub-fields of the 
Aquila and Rosette star-forming regions. The source extraction code and validation images with a reference extraction catalog are 
freely available. 

Key words. Stars: formation - Infrared: ISM - Submillimeter: ISM - Methods: data analysis - Techniques: image processing - 
Techniques: photometric 



1. Introduction 

The Herschel Space Observatory (Pilbratt et al. 2010) provides 
the best opportunity to study the earliest stages of star formation. 
Prestellar cores and young (Class 0) protostars emit the bulk of 
their luminosities at wavelengths 80-400 yum, which makes the 
Herschel imaging instruments PACS (Poglitsch et al. 2010) and 
SPIRE (Griffin et al. 2010) with their 6 wavebands from 70 to 
500 yum perfect for performing a census of these objects down 
to 0.01-0.1 Mq in the nearby (distances D ^ 500 pc) molecu- 
lar cloud complexes. In particular, the Herschel Gould Belt sur- 
vey (Andre et al. 2010) aims at probing the link between diffuse 
cirrus-like structures and compact cores with the main goal to 
understand the physical mechanisms of the formation of prestel- 
lar cores out of the diffuse medium, which is crucial for under- 
standing the origin of stellar masses. Furthermore, the Herschel 
HOBYS survey (Motte et al. 2010) aims at performing a census 
of massive young stellar objects, providing accurate bolometric 
luminosities and envelope masses for homogeneous and com- 
plete samples of the progenitors of massive stars. 

Preparing for these two Herschel surveys, we had evaluated a 
few popular source extraction algorithms to check whether they 
could be used in our surveys. The main problem was the ab- 
sence of any multi-wavelength extraction technique. None of the 
methods was designed to handle multi- wavelength data, making 
it necessary to match the independent catalogs obtained at differ- 
ent wavelengths using an association radius as a free parameter. 
This posed very serious problems for detecting and measuring 



Send offprint requests to: Alexander Men'shchikov 
Correspondence to: alexander.menshchikov@cea.fr 



sources in the Herschel images with angular resolutions differ- 
ing by a factor of ~ 7. A direct consequence of the mismatch in 
resolutions is that the degree to which sources in a region may 
be blended depends very strongly on the waveband. 

With such great differences in resolution, one cannot just 
match independent catalogs without introducing unknown and 
potentially very large errors in the association of sources de- 
tected across different wavebands and in their measured prop- 
erties. Studying star formation in the nearest clouds, one ex- 
pects to resolve many crowded regions in the highest-resolution 
images at the shortest wavelengths, but at the same time, one 
would progressively "lose" sources within much larger beams at 
the longer wavelengths. In effect, the fluxes of such sources on 
the long-wavelength side of their spectral energy distributions 
(SEDs) would have large and unknown errors when matching 
independent extraction catalogs, greatly reducing the extraction 
quality (detection completeness and reliability, as well as the ac- 
curacy of the derived properties). 

For large-scale projects, such as the Herschel Gould Belt and 
HOBYS surveys, one needs a fully automated source extraction 
method that can find as many sources as possible from the im- 
ages in all bands, reliably distiguishing them from variable back- 
grounds and noise, deblending them as accurately as possible 
in the crowded regions by preserving and utilizing all informa- 
tion obtained from the higher-resolution images at shorter wave- 
lengths. A very serious problem with some existing source ex- 
traction methods is that they do not allow sources to overlap, thus 
deblending of crowded regions is impossible. One cannot expect 
from such methods high levels of detection completeness or con- 
sistently accurate flux measurements in realistic conditions. 
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Note that most existing methods have been developed and 
oriented for use in different areas of astronomy, thus their per- 
formances for a specific project must be carefully analyzed be- 
fore an appropriate method can be chosen. The SPIRE SAG3 
consortium has been testing several popular source extrac- 
tion algorithms using simulated skies of various degrees of 
complexity and those benchmarks will be published elsewhere 
(Men'shchikov et al. in prep.). Below we introduce the reader to 
the basic concepts of those techniques, to place our new method 1 
in a wider context. 

1.1. Existing methods of source extraction 

Most of the algorithms trying to solve the same (non-trivial) 
problem of source extraction originated from different ideas. 

Stutzki & Guesten (1990)'s gaussclumps (developed for 
position- velocity data cubes in molecular-line studies of molecu- 
lar clouds) performs least-squares fits of a Gaussian shape to the 
brightest peak constrained to keep the position and amplitude of 
the fitted shape close to the image maximum. Then it subtracts 
the fit from the image, producing the residuals image, and fits a 
new Gaussian shape to the brightest peak in the residuals. The it- 
erations continue until the total intensity of all subtracted clumps 
is equal to the integrated intensity of the original image or there 
are no significant peaks left. 

Williams et al. (1994)'s clump find (aimed also for molec- 
ular clouds data and position-velocity cubes) contours an im- 
age at a number of levels, starting from the brightest peak in the 
image. It descends down to a minimum contour level, marking 
as clumps along the way all connected areas of pixels that are 
above the contour level. This technique was "motivated by how 
the eye decomposes the maps into clumps" and it "mimics what 
an infinitely patient observer would do" (Williams et al. 1994). 
One should be aware that this method ignores the backgrounds 
in which sources are observed. 

Bertin & Arnouts (1996)'s sextractor (designed for use 
with optical and near-IR images in extragalactic astronomy) 
estimates and subtracts background using sigma clipping and 
spline interpolation, then uses thresholding to find sources in 
the background-subtracted image, deblends them if they overlap, 
and measures their positions and sizes using intensity moments. 
A very useful property of this versatile algorithm is that it allows 
using a detection image that differs from the observed image and 
it can match sources with previously obtained catalogs. 

CUPID 2 's reinhold (oriented primarily to analyzing clumps 
in submillimeter data cubes) identifies "pixels within the image 
which mark the edges of emission clumps, producing a set of 
rings around the clumps. However, these structures can be badly 
affected by noise in the data and so need to be cleaned up. This 
is done using cellular automata which first dilate the rings or 
shells, and then erode them. After cleaning, all pixels within each 
ring or shell are assumed to belong to a single clump" (see the 
reference in the footnote). 

CUPID 's fellwalker (also oriented towards clumps in sub- 
millimeter data cubes) finds image peaks by tracing the line of 
the steepest ascent, considering every pixel with a value above a 
specified threshold as a starting point for a walk to a peak along 
the steepest gradient. Having reached a peak, it searches for 



1 For a very brief summary, see Men'shchikov et al. (2010). 

2 CUPID is a source extraction software package developed by the 
STARLINK team for use with the SCUBA2 surveys; it is a general 
wrapper to which additional methods can be added. See documentation: 
http://docs.jach.hawaii.edu/star/sun255.htx/sun255.html 



an even higher pixel intensity in a neighborhood; when found, 
the algorithm switches to that pixel and continues uphill. If a 
peak is found that is higher than all pixels in its neighborhood, 
a clump has been detected and the algorithm marks all pixels 
visited along the way as belonging to the clump. 

Motte et al. (2003, 2007)'s mre-gcl (created for studies 
of star formation using ground-based submillimeter continuum 
imaging) combines gaussclumps with the image filtering based 
on a wavelet decomposition. The algorithm decomposes an im- 
age in spatial scales using an isotropic wavelet decomposition 
with the multi-resolution code mrJransform (Starck & Murtagh 
2006), subtracts all scales larger than the largest scale of inter- 
est from the original image, and uses gaussclumps to detect and 
measure sources in the filtered image. Then the user defines each 
source's largest extent as twice its measured size and repeats the 
decomposition and filtering steps for each source, runs gauss- 
clumps again with the aim to improve the measurements of sizes 
and fluxes. 

Molinari et al. (2011)'s cutex (developed for studying star 
formation with Herschel) attempts to overcome the difficulty 
of thresholding of an entire image: highly- variable backgrounds 
were expected in star-forming regions and indeed observed with 
Herschel. It analyzes multi-directional second derivatives of the 
original image and performs curvature thresholding to isolate 
compact sources out of extended emission, then fits variable- 
size elliptical Gaussians (adding also a planar background) at 
their positions. The algorithm can fit up to 8 Gaussians simulta- 
neously in crowded areas, if sources are closer than two observa- 
tional beam sizes. This method works only for compact sources 
with sizes up to approximately 3 times the beam size. 

Kirk et al. (2012, in prep.)'s csar (developed for use with 
the BLAST and Herschel images) is another method that defines 
clumps in terms of connected pixels. A source is defined as a 
region of connected pixels bound by a closed isophotal contour 
that contains at least one pixel that is at 3 cr (where <x is the stan- 
dard deviation) above the bounding contour level. The algorithm 
starts contouring just below the peak on the image and walks 
down until some predefined background is reached; sources are 
considered finished just before they become connected to oth- 
ers. Sources are not allowed to overlap and no attempt is made 
to assign flux outside of the closed contours to any source. The 
technique was designed with a "purpose of replicating what a 
(trained) human would do with an image if extracting sources 
manually" (J. Kirk, private comm.). 

Crowded regions that are frequently observed with Herschel; 
the deblending of overlapping sources is the origin of major un- 
certainties. Whereas clump find, reinhold, fellwalker, and csar 
merely partition the image between sources not allowing them 
to overlap, gaussclumps, sextractor, mre-gcl, and cutex can de- 
blend overlapping sources, which is quite an essential property 
for obtaining accurate results in crowded regions. We feel the 
need to stress here that the observed images are only projections 
of the complex three-dimensional reality onto the plane of the 
sky and, as such, it is a fundamental source of major uncertain- 
ties in the interpretation of observations and in the derived prop- 
erties of objects. As we know from Herschel observations (e.g., 
Men'shchikov et al. 2010; Arzoumanian et al. 2011), the inter- 
stellar medium is highly filamentary, thus the filaments' orienta- 
tions play a very significant role in the appearance of the regions 
we observe. 
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1.2. Introducing a new approach 

In this paper, we present the source extraction algorithm get- 
sources developed with the aim to overcome the shortcomings 
of the existing methods and provide researchers in this Herschel 
era with a better extraction tool. For details on the astrophysical 
context of this work, we refer to Appendix A. 

To clarify our terminology, we shall use the term noise to 
refer to the statistical instrumental noise including possible con- 
tributions from any other signals that are not astrophysical in 
nature, i.e. which are not related to the emission of the areas 
in space we are observing. In contrast, the term background 
will refer to the (filamentary) astrophysical backgrounds, such 
as cirruses or molecular clouds, containing the sources we are 
observing and objects we are studying (e.g., stars, protostars, 
cores, etc.) 3 . In order to reduce possible confusion, we are go- 
ing to clearly distinguish between the morphologically-simple 
(convex, not very elongated) sources of emission as determined 
by the source extraction algorithms and the objects of specific 
astrophysical nature that are selected from the entire extraction 
catalog on the basis of all available information (besides the im- 
ages) and some additional assumptions, criteria, and techniques, 
depending on the specific interests of a researcher. 

The problem of detecting sources in continuum images usu- 
ally reduces to finding significant intensity peaks, as such images 
provide us with just complex intensity distributions of the sky 
over an observed area. At the present state of the art, source ex- 
traction procedures do not know anything about the astrophysi- 
cal nature or true physical properties of the objects that produced 
the emission of those significant peaks. An extraction algorithm 
can only detect sources (that are possibly harboring our objects 
of interest) and determine their apparent two-dimensional inten- 
sity distributions above the variable background and noise, and 
measure their apparent properties at each wavelength. This is 
the only information contained in the images, which is available 
to any extraction algorithm. The purpose of source extraction is 
to detect as many real sources as possible (distinguishing them 
from noise and background fluctuations) and to measure their 
apparent properties as accurately as possible 4 . 

A catalog of such sources serves as the fundamental basis for 
all subsequent in-depth studies of the various objects of different 
nature. Only after we have detected significant sources of emis- 
sion and measured all possible apparent properties (peak intensi- 
ties, integrated fluxes, sizes, etc.), can we utilize those results to 
infer the real astrophysical nature and properties of the objects 5 . 
At this step, one should combine all the information in different 
wavebands from the extraction catalog and also from previous or 
follow-up observations, as well as any other available informa- 
tion, and classify the objects according to their physical nature 
and use them to derive new astrophysical knowledge. In what 



3 The backgrounds we are dealing with are known to be structured at 
all scales, spatially fluctuating in an unknown way and thus creating the 
difficult problem of background removal. If the real backgrounds were 
just smooth large-scale structures, one would be able to approximate 
and subtract or filter them quite well. 

4 Different extraction algorithms produce varying numbers of sources 
and estimates of their properties for the same set of images. The quality 
of the extraction methods can be assessed by measuring how well they 
are able to reconstruct known properties of model sources in simulated 
images. 

5 Real physical properties of objects (e.g., their size) may be quite 
different from the measured apparent properties of the extracted sources 
at different wavelengths, whose accuracy, in turn, critically depends on 
the quality of the extraction algorithm. 



follows, we only consider the sources and leave the definition 
of objects to the future papers exploring various astrophysical 
applications of this new extraction algorithm. 

Unlike other source extraction algorithms (except mre-gcl, 
Sect. 1.1), the new method analyzes fine spatial decompositions 
of original images across a wide range of scales and across 
all wavelengths (Sect. 2.2). As part of its multi-wavelength de- 
sign, getsources removes the noise and background fluctuations 
from the decomposed images (Sect. 2.3) separately in each band, 
and constructs a set of wavelength-independent detection images 
(Sect. 2.4) that preserve information in both spatial and wave- 
length dimensions as well as possible. Sources are detected in 
the combined detection images by following the evolution of 
their segmentation masks across all spatial scales (Sect. 2.5). 
Measurements of the source properties are performed in the orig- 
inal images at each wavelength after the background has been 
subtracted by interpolation under the sources' "footprints" and 
after overlapping sources have been deblended (Sect. 2.6). To 
facilitate visual analysis of the extraction results and various 
steps of the algorithm, a number of useful images are created 
for each waveband (Sect. 2.7). Based on the results of the ini- 
tial extraction, detection images are "flattened" to produce much 
more uniform noise and background fluctuations in preparation 
for the second, final extraction (Sect. 2.8). The performance of 
getsources for Herschel images is illustrated on small sub-fields 
of the Aquila and Rosette star- forming regions (Sect. 3). 

2. The getsources extraction method 

The fundamental problem in extracting sources from observed 
images is that all spatial scales are mixed together and the inten- 
sity of any given pixel contains an unknown contribution from 
the noise, background, and surrounding blended sources. The 
central problem in accurate source extraction is to separate those 
contributions from the signal of the real sources. 

The main idea of getsources is to analyze decompositions 
of original images (at each wavelength) across a wide range of 
spatial scales separated by only a small amount (typically ~ 3- 
5%). Replacing originals with a set of strongly filtered images 
brings several significant advantages. Each of the "single scales" 
contains non-negligible signals from only a relatively narrow 
range of spatial scales, mostly only from those sources (and the 
noise and background fluctuations) which have sizes similar to 
the scale considered. In effect, this automatically filters out all 
contributions of the noise, background, or overlapping sources 
on irrelevant (much smaller and larger) spatial scales. An im- 
mediate benefit is that such a filtering allows one to manipu- 
late entire single- scale images as a whole and use thresholding 
to separate sources from the background and noise in the ob- 
served images (see Sect. 2.3 for details). Furthermore, consider- 
ing the same spatial scales across a wide range of wavelengths 
allows one to sum up single-scale images at all wavelengths in 
combined (wavelength-independent) single- scale detection im- 
ages and thus preserve the high-resolution information across 
all wavebands, minimizing the effect of degrading resolutions. 
Besides providing a substantial "super-resolution" effect, this 
eliminates the need of matching multiple catalogs obtained with 
different beams and reduces the matching and measurement er- 
rors. 

The extraction method is represented by 7 processing blocks 
shown in Fig. 1; they will be described below in Sects. 2.1-2.7. 
In order to make a clear distinction between images and var- 
ious other parameters, the images are denoted throughout this 
paper by the capital calligraphic characters (e.g., &l,B,C', see 
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Prepare observed & detection images, 
observational masks: I K -» I xo , J AD ; M K 



Decompose detection images 
in many spatial scales: r AD -»J ADy 



Clean single scales by removing 
noise and background: I KDj -» J ADy . G 



Combine clean single-scale images 
over all wavelengths: J ADC I DjC 



Detect & catalog sources in combined 
clean single-scale images: I Djc -» C D 



Measure & catalog sources' properties 
at all wavelengths: C D ,I A0 -»C MA 

Visualize all extracted sources and each 
individual source: C MA , I xo -* I AE/ J Ay 



Fig. 1. Main processing blocks of the getsources algorithm described in 
Sects. 2.1-2.7. 

Appendix B for a list of all symbols and definitions). The fol- 
lowing subsections describe the algorithm in full detail. 

2.1. Preparing observed and detection images 

The first step (Fig. 1) towards the source extraction is to convert 
the original images Ix at all wavelengths to the same grid. This 
means to transform them into the observed images Ixo, all with 
the same numbers of pixels, the same pixel size, aligned across 
wavelengths as accurately as possible (covering the same area on 
the sky), the same reference pixel and its coordinates. In practice, 
this is done by resampling all images to the same pixel size using 
the astronomical utility SWarp (Bertin et al. 2002). 

Note that the alignment of images must be carefully checked 
before extracting sources with getsources, because of its multi- 
wavelength design. Images in all wavebands will be combined 
together in wavelength-independent detection images (Sect. 2.4) 
that will only be good if the originals are aligned within one 
pixel; significant misalignments of the images can create spuri- 
ous sources 6 . 

The observed images Ixo are only used to measure prop- 
erties of detected sources, at the end of an extraction (Fig. 1, 
Sect. 2.6). Most of the processing in the algorithm is done on 
the detection images Ixd, which in the simplest case may be the 
same as Ixo, but in general can significantly differ from the latter. 
Any transformations of the observed images that have a potential 
to improve detection quality (such as completeness, reliability) 
can be used as Ixd', this can be convolution, multiplication by 

6 In practice, the most accurate approach to alignment is to use im- 
ages containing only small scales (see Sect. 2.2), up to ~twice the res- 
olution in each waveband, as they show misalignments most clearly. 
One should carefully choose which peaks to align, as the appearance of 
sources may be affected by radiative transfer effects or by fluctuating 
backgrounds or by the close proximity to other sources. 



weight images, subtraction of baseline images, etc. For exam- 
ple, one may want to sacrifice a little bit of the nominal angular 
resolution in order to reduce the unphysical pixel-to-pixel noise 
present in the images (on scales smaller than the observational 
beam size Ox) using convolution Ixd = Qa*Iao, where Qx is the 
smoothing Gaussian with full width at half maximum (FWHM) 
chosen to slightly degrade (by ~5%) the image resolution Ox- 
This suppresses unphysical noise and small-scale artifacts in Ixd 
that otherwise may become enhanced in the smallest- scale de- 
composed images. Being the default way of creating detection 
images in getsources, such smoothing is not required and may 
be skipped, if not deemed beneficial. Another example of a de- 
tection image that may be useful in some applications is a col- 
umn density map produced by pixel-to-pixel SED fitting of the 
observed images. 

The last part of the preparation is creating the observational 
masks Mx- Those are images with the pixel values of either 1 or 
0, defining the areas in the original images that we are interested 
in. In practice, some areas of the observed rectangular images 
may not have been covered, some other areas may contain high 
noise or artifacts. The mask images are used by getsources to 
exclude from processing any area of Ixd in which the mask has 
zero values; in the simplest ideal case, Mx has values of 1 in all 
pixels. Very noisy areas have the potential to affect the cleaning 
and detection algorithms described below and every effort must 
be made to exclude such areas using carefully prepared observa- 
tional masks. 

2.2. Decomposing detection images in spatiai scales 

The spatial decomposition is done by convolving the original 
images with circular Gaussians and subtracting them from one 
another (we call this procedure successive unsharp masking): 

Iadj = ffj-i * Ixd - Qj * Ixd 0' = h 2, . . . , N s ), (1) 

where Ixd is the detection image (Sect. 2.1) at wavelength X, 
IxDj are its "single-scale" decompositions, Qj are the smooth- 
ing Gaussian beams (Qo is a two-dimensional delta- function). 
The latter have FWHM sizes Sj = f s Sj-i in the range 2A<Sj< 
Smax, where A is the pixel size, f$ > 1 is the scale factor, S max is 
the maximum spatial scale considered, and the number of scales 
TVs depends on the value of f$ (typically ^1.05) and S m?LX . We 
adopt S max = 3 max {A™ ax }, where A™ ax is the maximum FWHM 
sizes of sources to be extracted and an upper limit for S^ax is the 
image size 7 . 

Equation 1 implicitly assumes that the convolved images are 
properly rescaled to conserve their total flux; therefore, the orig- 
inal image can be recovered by summing up all scales: 

N s 

Ixd - ^ IxDj + Qn s * Ixd- (2) 

Before convolution, the input images Ixd are expanded from 
the edges of the areas covered by the observational masks Mx 
towards the image edges and the entire images are further ex- 
panded on all sides by a large enough number of pixels (3 Sj/A) 
in order to avoid undesirable border effects. Both expansions are 

7 The wavelength-dependent maximum sizes of sources are the only 
user-definable parameters in getsources. The actual maximum sizes de- 
pend on the observed images and the specific interest of a researcher. 
Before extracting sources, one has to obtain reasonable guesses of the 
maximum source sizes from the images and specify them in the config- 
uration file (the parameter A™ ax defaults to 6 A ). 
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Fig. 2. Single-scale decomposition (Sect. 2.2). The central 0? 44x0? 44 sub-field of the detection image I AD of a l°xl° simulated star-forming region 
at 350 iim {upper left). Its single-scale images I ADj are shown at the scales indicated (left to right, top to bottom) for j=ll, 30, 43, 57, 70, N$ = 99, 
fs = 1.053, S i = 4" ', S Ns = 660" (see Eq. 1). The image dimensions are 1800x1800 pixels and the pixel size A = 2". The scales were selected to be 
separated by a factor of 2 to illustrate the spatial decomposition. The scale sizes Sj are visualized by the yellow-black circles and annotated at the 
bottom of the panels. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a linear function of 
intensity in MJy/sr. 



performed using the pixel values at the edges of the masks and 
images, respectively; after convolution, the images are reduced 
back to their original size. 

Small values of fs ensure the best spatial resolution of the 
single scales, just like fine mesh sizes always better resolve 
structures in numerical methods. For practical reasons, the min- 
imum value of fs is 1.03 and the maximum value of N$ is 99. 
For large fs, the single scales actually contain mixture of a wide 
range of scales, and faint small-scale structures become com- 
pletely diluted by the contribution of irrelevant scales. Again, 
this is very similar to any finite-difference numerical methods, 
where structures smaller than a few grid zones disappear within 
large structures resolved by coarse grids 8 . 

To illustrate the spatial decomposition and all other process- 
ing steps of getsources, we shall use images of a simulated star- 
forming region that we constructed well before the launch of 
Herschel in order to have a reasonably realistic model for testing 
various aspects of our future observational program (the source 



8 Usually best results are obtained with N s and fs close to their lim- 
iting values. For / s = 2, the decomposition of Eq. 1 is identical to that 
produced by the multi-resolution code mrJransform (Starck & Murtagh 
2006) with its default linear wavelet transform ("a trous" algorithm). 



extraction methods, instruments simulators, etc.); we refer to 
Appendix C for more details. The spatial decomposition of im- 
ages using convolution has a clear interpretation in terms of the 
Fourier transform. Interested readers are referred to Appendix D, 
where we present the Fourier amplitudes for the individual com- 
ponents of the simulated sky and for a few selected scales of 
the decomposed images, as well as examples from the actual 
Herschel observations 9 . 

A sub-field of the simulated region at 350 /mi (Fig. 2, upper 
left) clearly shows all ingredients: the cirrus background, proto- 
stars (bright compact peaks), and fainter starless cores of vari- 
ous sizes, from completely unresolved to very extended. Many 
sources vanish into the background and also many sources are 
blended with others. However, the single-scale decompositions 
filter out emission at all irrelevant scales and display the sources 
with a much higher contrast than the full image does. The de- 
composition of Eq. 1 thus naturally selects sources of specific 
sizes, which become best visible in the images with matching 



9 Except the spatial decomposition step, where convolutions are done 
using a fast Fourier transform algorithm, our method has been designed 
to operate in the image space, which is a natural and intuitive way of 
source extraction. 
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Fig. 3. Single-scale removal of noise and background (Sect. 2.3). The field of Fig. 2 is shown as the full clean image I ADC at 350 /urn (upper left) 
that accumulates clean images over all scales. The same set of spatial scales is displayed in the single-scale images Ixojc (left to right, top to 
bottom), cleaned of noise and background with an iterative procedure described in Sect. 2.3. All intensity peaks visible in the scales belong to the 
sources; most of the noise and background fluctuations (visible in Fig. 2) have been removed. The scale sizes Sj are visualized by the yellow-black 
circles and annotated at the bottom of the panels. For better visibility, the values displayed in the panels are somewhat limited in range; the color 
coding is a function of the square root of intensity in MJy/sr. 



scales. The negative rings around bright sources are the direct 
consequence of the successive unsharp masking, the subtraction 
of an image convolution with a larger smoothing beam from an 
image convolved with a smaller beam. All peaks at the first four 
scales shown in Fig. 2 are identifiable with the corresponding 
peaks in the full image. However, the situation becomes more 
problematic as we proceed to larger scales, such as that displayed 
in the last panel of Fig. 2. For even larger scales, up to the en- 
tire image size, intensities from sources of all sizes become so 
heavily mixed and diluted in the large smoothing beams that one 
cannot disentangle the individual structures anymore. 

This demonstrates the need to set a reasonable upper limit 
for the sources to be extracted 10 . Indeed, if huge structures were 
to be allowed (with sizes by orders of magnitude larger than 
Ox), they would also have to be used in the background sub- 
traction and deblending, both of which become increasingly in- 
accurate on very large scales. For accurate removal of the back- 
ground, one has to either approximate or interpolate it; both ap- 
proaches become highly uncertain when large distances are in- 
volved. Deblending of overlapping structures requires a good 



10 Note that getsources has no fundamental limitation on the spatial 
scales or source sizes except they must be smaller than the image size. 



approximation of their intensity distributions, which also be- 
comes inaccurate on very large scales. This would also greatly 
reduce the quality of the detection and measurements for the ma- 
jority of "normal" sources. Many of the latter would be fully 
contained within the much larger (sub- structured) sources and to 
accurately measure them, one has to consider the large sources 
as their background 11 . 

2.3. Cleaning single scales of noise and background 

Before one can use the single- scale detection images I^oj for 
source extraction, they need to be cleaned of the contributions 
of noise and background to make sure that most (if not all) non- 
zero pixels belong to real sources. The noise and background 
fluctuations in the far-infrared and submillimeter images of in- 
terstellar clouds, such as those coming from the Herschel Gould 
Belt and HOBYS surveys, do not follow Gaussian statistics. A 
great advantage of the fine spatial decomposition employed by 



11 If one is interested primarily in extracting very large structures, one 
could first extract all smaller sources, subtract them from the original 
image, and then run the extraction again, targeted specifically at those 
structures. 
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Fig. 4. Single-scale cleaning residuals (Sect. 2.3). The field of Fig. 2 is shown as the reconstructed image Iadr of the residuals at 350 //m (upper 
left) that accumulates cleaning residuals from all scales. The same set of spatial scales is displayed in the single-scale images of the residuals I^jR 
(left to right, top to bottom). The cleaning procedure left no significant intensity peaks of the simulated objects in the residuals, only the noise- and 
background-dominated pixels (cf. Fig. 7). The scale sizes Sj are visualized by the yellow-black circles and annotated at the bottom of the panels. 
The color coding is a linear function of intensity in MJy/sr. 



getsources (Sect. 2.2) is that the emission fluctuations in the 
decomposed images of interstellar clouds or Galactic cirrus be- 
come approximately Gaussian. Thus, significant departures from 
Gaussian distribution in the single-scale images indicate the 
presence of sources. For more details and illustrations in terms 
of the power spectra of the components of the actual Herschel 
images, we refer to Appendix E; see also Fig. 5 below. 

Cleaning can be done by global intensity thresholding of the 
single-scale images, as the larger-scale background has been ef- 
fectively filtered out by the spatial decomposition. Unlike the 
original images I^o or I^d that often have a very strong and 
highly-variable background, the entire single-scale images are 
"flat" in the sense that all signals on considerably larger scales 
have been removed (see Fig. 2). Another advantage of this 
single-scale cleaning is that the noise contribution depends very 
significantly on the scale. For example, the small-scale noise 
gets heavily diluted at large scales, where extended sources 
become best visible. In effect, in a reconstructed clean image 
Xidc = Zj IxDjC one can see large structures better (deeper) than 
inX| D . 

To clean the single- scale images, we designed an iterative 
algorithm 12 that automatically finds at each scale a cut-off level 



A procedure similar to what is usually called "sigma clipping". 



that separates the signal of significant sources from those of the 
noise and background. At the first scale (j=l) it computes the 
cut-off (threshold) tu^ = n A j cr^j for the image I^j Ma, where 
CTxj is the standard deviation over the entire image and n^j is a 
variable factor having an initial value of n^i = 6 (this j=l value 
was found to be optimal in our tests). Then the procedure masks 
out all pixels with the values \I^\ > TUxj and repeats the calcula- 
tion of o~xj over the remaining pixels, estimating a new threshold, 
which is generally lower than the one at the previous iteration. 
The procedure masks out bright pixels again and iterates fur- 
ther, always computing cr^j at < TUxj, outside the peaks and 
hollows, until TUxj converges (Aw^ < 1%). To produce a clean 
single-scale image XidjC an " pixels with Ixj<TUxj are zeroed, 
which (ideally) leaves non-zero only those pixels that belong to 
significant peaks from sources. Several examples of clean im- 
ages are displayed in the last five panels of Fig. 3. 

In addition, the low-intensity pixels \I^j\ < Tu^j define the 
single-scale images of the residuals Tad/r* as well as the re- 
constructed image of the residuals J^dr = Yuj -Tid /r- The images 
are shown in Fig. 4, where one can clearly see that the single- 
scale residuals are much "flatter" than those accumulated over all 
scales. This illustrates why the single- scale cleaning can be per- 
formed on the entire images using thresholding, in contrast to the 
full images (for more illustrations, see Figs. 2, 5). Furthermore, 
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Fig. 5. Using skewness and kurtosis for iterating accurate thresholds TJ Aj in the cleaning process (Sect. 2.3). The upper panels show histograms 
for the full image I AD at 350 yum (left) and for the decomposed images I ADj at the 9" and 36" scales (middle, right) before cleaning. The lower 
panels display histograms for the reconstructed residuals I A br (left) and for the residuals IxojK of the two decomposed images of the upper panels 
(middle, right). The vertical lines in the left panels indicate the standard deviation cr A (short dash, green) and 6 cr A (long dash, magenta) computed 
in the full image I AD . In the other panels they indicate the converged values of the standard deviation cr A j and 6 cr A j in the single-scale images I AD j, 
as well as the final thresholds TU Aj (solid, red). The histogram of the residuals I A br of the cleaning process, reconstructed from all spatial scales 
(lower left) has much greater symmetry and resembles a Gaussian distribution, whereas the histogram of the full image I AD (red, copied from the 
upper-left panel) is highly asymmetric. Both s™ ax and k™ x have a value of 3.17 and the corresponding variable factors n Aj have the values of 4.52 
and 4.09 for the two single-scale images. The width of the intensity bins is 1 MJy/sr in the left panels and 0.004 MJy/sr in all other panels. 



the flattening step of our method (Sect. 2.8) ensures that also 
the standard deviations of the single-scale residuals become as 
uniform as possible over the entire image. 

Starting with the second scale, j = 2, the factor n^ is allowed 
to become lower than its initial value of n / a=6at j=l, depend- 
ing on the image and the scale. Being an important parameter 
for the iterations to converge to accurate cut-off levels, n^j must 
be accurately chosen. Empirical evidence shows that if n^j were 
smaller than an optimal value, the iterations would converge to 
TDxj that is too low, resulting in noise peaks contaminating the 
clean images. On the other hand, if n^j were larger than its op- 
timal value, the iterations would converge to a value of m^j not 
deep enough, thus some fainter sources present in I^j would be 
missing in the clean images Imjc- 

To determine the appropriate values of n^, one can use the 
higher-order statistical quantities, skewness and kurtosis 

SAj =V3Aj 0~fj, k Aj =fi4Aj cr^j - 3, (3) 

where ji^j and jiuj are the third and fourth moments about the 
mean (both s A j and k A j are zero for a standard normal distribu- 
tion). The idea is that when the pixel distribution of the resid- 
uals Iadjr becomes too asymmetric (large \s A j\) or too peaked 
(large kxj), the optimal value of n A j must actually have been 
lower. Thus, having iterated the cut-offs tux\ at the first scale, 



getsources computes the upper limits to s^j and k A j given by an 
empirical formula 

^max = ^max = max | 2 14 ln|^- + 22oj - 1 1.3, 0.25 j, (4) 

where is the maximum pixel intensity over I Am Ma- 

When iterations at scale j converge to a threshold tu^, get- 
sources computes s^j and kxj in the image of the residuals I^d/r. 
If \s Aj \ > s™ x or k Aj > k™ x , the algorithm slightly (by a few per- 
cent) reduces n A j, whose initial value is n^j-i from the previous 
scale, and re-iterates ruxj- This procedure ensures that sxj and 
kxj always stay within the empirical bounds in the process of 
obtaining the thresholds and cleaning the single- scale images. 
Extensive experimentation has shown that the limits of Eq. 4 
work very well for all images that we have tested 13 . 

The pixel distributions shown in Fig. 5 illustrate the clean- 
ing algorithm. The histogram for the original image I^d shown 
in the upper-left panel contains all spatial scales and is there- 
fore very wide and asymmetric; it cannot be used to separate 

13 We have applied getsources to the multi-wavelength images of a 
dozen of simulated fields, the ground-based (sub-)mm images of NGC 
2068, NGC 2071, NGC 2264, W 43, and the Herschel images of Aquila, 
Cepheus, Cygnus X, IC 5146, Lupus, M 16, NGC 4559, NGC 7538, 
Orion B, Perseus, Pipe, Polaris, RCW 79, RCW 82, RCW 120, Rosette, 
Taurus, Vela, W 3, W 48. 
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Fig. 6. Two types of combined single-scale images (Sect. 2.4). Schematically shown are the image JdjC (left) used to detect sources 
evolution of their shapes and the image I' DjC (right) used to determine the characteristic scales and initial footprints. 
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sources from noise and background using global thresholding. 
All scales are blended together in such images and any thresh- 
old would enable one to either find only the brightest peaks los- 
ing most fainter sources or create many spurious peaks from the 
pixels belonging to the variable background or noise. In contrast, 
the highly-filtered single-scale images IxDj contain only narrow 
ranges of spatial scales and thus the histograms of the residu- 
als J^djR (representing the background and noise fluctuations) 
are much narrower and symmetric, resembling a Gaussian dis- 
tribution (Fig. 5). Having no signals from all irrelevant larger 
scales, the images are much "flatter" and therefore well suited 
for the cleaning algorithm. The single-scale histograms show 
that using the upper limits of Eq. 4 for skewness and kurto- 
sis helps in correcting the variable factor nxj and thus in get- 
ting deeper thresholds ruxj and much better detection of fainter 
sources while avoiding creation of spurious sources. 

2.4. Combining clean single scales over all wavelengths 

The cleaning algorithm outlined in Sect. 2.3 is applied to the 
single-scale detection images IxDj independently in each wave- 
band. Clearly, getsources also works the same way for just one 
image at a single wavelength. It is necessary to fully com- 
plete "monochromatic" extractions as they provide the footprints 
of all sources and also the estimates of the sizes A™ ax of the 
largest sources, the information used by our flattening procedure 
(Sect. 2.8). Combining wavelengths means utilizing all informa- 
tion across all bands and this should be used to improve the de- 
tection and measurement qualities over the simple approach of 
matching separate catalogs obtained in each waveband indepen- 
dently. Whereas combining independent catalogs on the basis 
of the association radius is possible for a few images obtained 
with similar angular resolutions, this usual approach of associ- 
ating sources between wavelengths would introduce large and 
unknown errors when applied to Herschel images whose resolu- 
tions differ by as much as a factor of ~7. 

In general, it is impossible to combine images at different 
wavelengths in a meaningful way using full images contain- 
ing signals on all spatial scales. The fine spatial decomposi- 



tion employed by getsources (Sect. 2.2) that filters out signals 
from all irrelevant scales, together with the single-scale clean- 
ing (Sect. 2.3) enable us to create wavelength-independent clean 
images that accumulate only significant intensity peaks from all 
wavelengths, representing potential sources. The combined im- 
ages must be normalized because of highly varying intensities in 
different bands; there is no need to preserve the spectral behav- 
ior of sources in the images used only for detection (Sect. 2.5). 
Indeed, the wavelength-dependent properties of all sources will 
be measured in the observed images Ixo (Sect. 2.6) after all 
sources have been detected. 

The cleaning procedure described in Sect. 2.3 works well 
when the small-scale noise and background fluctuations are rel- 
atively uniform across the image and there are no strong arti- 
facts. It is not unusual, however, for the observed images to con- 
tain quite variable noise and various types of artifacts, including 
those from the map-making process. In order to reduce possi- 
ble contamination of the clean images with the pixels belonging 
to the noise peaks or artifacts, getsources additionally employs a 
lower limit on the number of pixels Nju in a cluster of connected 
pixels that may remain in single- scale images: 



max 



1 [O x 



(5) 



where Ox is the observational beam size, A is the pixel size, and 
the default value of the parameter A^ n is 4. Small clusters with 
Nm < A^m 11 are removed from the single-scale images IxnjC 
before the latter are combined. 

When combining single scales from different wavebands, 
getsources only sums up limited ranges of spatial scales, de- 
pending on the angular resolution and the sizes of sources that 
can be found in the images. The range of scales is limited 
from below because the smallest scales may contain decreas- 
ing (and progressively noisier) contribution for the images ob- 
tained at longer wavelengths with poorer resolutions. Single 
scales within a factor of 3 below the nominal resolution in each 
band may still contain considerable signal from the sources. 
Accordingly, the lower limit of scales Sj being combined de- 
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Fig. 7. Combination of single-scale images (Sect. 2.4). The field of Fig. 2 is shown as source masks Mbc accumulated over all scales and 
wavebands (upper left) that give a summary view of how the sources, made visible by the cleaning (cf. Fig. 3), change their shapes and sizes. The 
same set of spatial scales is displayed in the combined clean single- scale images I^jc (left to right, top to bottom) that accumulate information at 
those scales from all wavelengths. For better visibility, the values displayed in the masks image are limited to 300 and in the normalized images 
J DjC they are limited to 0.07, 0.3, 0.3, 0.6, and 0.6. The scale sizes Sj are visualized by yellow-black circles and annotated at the bottom of the 
panels. The color coding is a logarithmic function of intensity. 
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Fig. 8. Evolution of sources in the clean single-scale images (Sect. 2.5). The full image of the source at 350//m with a size of 3'7x3'7 (left) has 
been cut out of the corresponding panel in Fig. 2 (the source is located south-east of the image center, it is resolved in all wavebands). The other 
panels show the source in single-scale images Ixdjc at the scales of 18, 36, 138, and 199", maximum intensities in the panels being 0.31, 0.89, 
3.09, and 1.84 MJy/sr, respectively. The scale sizes Sj are visualized by the blue circles and annotated at the bottom of the panels. The color coding 
is a linear function of intensity. 



faults to max {S\, 0.33 O^}, allowing for a substantial "super- 
resolution" in getsources. On large scales, the range of scales 
is also limited by the (wavelength-dependent) maximum sizes 
^max Q f sources ( see Sect. 2.2). Initial guesses for A™ ax are 
given as input to getsources, however, they are accurately de- 



termined during the initial extraction and used in the final ex- 
traction (Sect. 2.8). 

It is necessary to define two sets of wavelength-combined 
single-scale images (illustrated in Fig. 6) for different purposes. 
The images JdjC are used to detect sources and track the evolu- 
tion of their shapes across all spatial scales (see Sect. 2.5) and the 
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Fig. 9. Single-scale source detection (Sect. 2.5). The field of Fig. 2 is shown as the initial footprint image Td after the source detection (upper 
left). The same set of spatial scales is displayed in the single-scale segmentation images J DjS (left to right, top to bottom) showing the source 
segmentation masks determined from Injc (Fig- 7). The masks were obtained and analyzed by the detection procedure described in Sect. 2.5. The 
scale sizes Sj are visualized by the yellow-black circles and annotated at the bottom of the panels. The color coding is a function of the square root 
of the source number (which makes up the actual pixel values in these segmentation images). 



images I' D j C are used to follow the dependence of the peak in- 
tensities of detected sources on scales. The first set of combined 
detection images is normalized such that all cleaning thresholds 
become equal to 1 in each band: 



^Yj^ m ^{ T ™jC,Txj 



N 



(6) 



where Txj is the threshold image (all pixels of which are equal to 
mxj), N is the number of wavelengths, and fxj gradually "turns 
on" the scales smaller than the observational beam Ox (for larger 
scales, f X] ? = 1): 



fAj ~[o: 



max 



{Si, 0.33 0*} <Sj<0 A . 



(7) 



The renormalization of images, the threshold tjxj in the curly 
brackets, and the turn-on factor fxj ensure that the images from 
different wavebands become smoothly combined in IvjC (Eq. 6) 
with no discontinuities and that there are no large changes in the 
combined images between any two adjacent scales (Fig. 6, left). 

The normalization of IxDjc to a common threshold before 
summing them up is the most natural way of combining wave- 
lengths to maximize sensitivity. It removes, however, the natural 



dependence of the source brightness on scales, which is used by 
our detection algorithm to determine the characteristic scale and 
initial footprint for each source (Sect. 2.5). Our second set of 
combined images allows that, as they are normalized only at the 
smallest scale (there is no reason to use the factor 1 /N here): 



DfC" 



y- 

1 ii 



Wa t 

rmax " ^Dj'CS 
X iDIC 



(8) 



where Ij^c * s me max i mum intensity over I^DjC and the weight 
wx enhances contributions of the images with higher angular res- 
olutions: 



(9) 



where O = N~ 1 Y,x Ox is the average observational beam size 
and y > 1 is a weighting exponent with a default value of 6. The 
aim of such weighting is to effectively separate the contribu- 
tions of different wavebands over the intensities when computing 
I' D j C (Eq. 8). After the weighting, the summation of IxDjc prac- 
tically does not alter their individual intensity profiles (Fig. 6, 
right). If a source exists in some high-resolution single- scale im- 
ages at a certain wavelength, the dependence of its peak intensity 
across scales is fully preserved in I' D j C . In the detection process, 



11 



Men'shchikov et al.: A multi-scale, multi- wavelength source extraction method 



the source position, characteristic size, and footprint are largely 
determined by the high-resolution images and the spatial scale 
where it becomes the brightest (Sect. 2.5). The initial size and 
footprint obtained in the detection step are recomputed during 
the measurement iterations (Sect. 2.6). 

Figure 7 illustrates our method of combining wavelengths. 
The upper-left panel shows an image of single-scale masks accu- 
mulated over scales and all Herschel wavebands, obtained from 
the clean single-scale images: 

where Tij is the threshold image and, of course, the division of 
the image by itself can only be done in non-zero pixels. The ac- 
cumulated mask image presents a cumulative view of how the 
sources made visible by cleaning change their shape and size 
across all scales and wavelengths. All sources that can possi- 
bly be found in the clean images have left their mark in Mnc- 
The other panels of Fig. 7 display the combined detection im- 
ages Jdjc f° r several single scales, which accumulate informa- 
tion from all wavebands. Comparing them with the correspond- 
ing panels of Fig. 3, one can verify that each combined scale is 
indeed populated by more sources than any of the clean single 
scales at individual wavelengths. 

2.5. Detecting sources in combined single-scale images 

As can be seen in Figs. 3 and 7, sources "evolve" from small 
to large scales in single- scale images. In the clean images Jd/c» 
the sources appear at some relatively small scales (smaller than 
their actual size), become the brightest at a scale roughly equal 
to their size, and eventually vanish at significantly larger scales 
(see Fig. 8 for an illustration). The idea of our source detec- 
tion scheme is to analyze all Jd/c = 1» • • • »^s)» identifying 
the sources and tracking the "evolution" of their shapes from 
small to large scales. For that purpose, getsources employs the 
Tint Fill algorithm (Smith 1979) 14 , developed for coloring arbi- 
trary shapes in digital images. The algorithm assumes that the 
sets of pixels belonging to any shape are 4-connected, i.e. that 
for any pair of pixels 11/ and IY m in the shape, there is a path 
from It/ to n m through pixels in that shape, such that neighbor- 
ing pixels in the path are connected to each other only by their 
sides (Fig. 10). Given a set of 4-connected (side-connected) pix- 
els, each one having the same property (e.g., color), the algo- 
rithm fills all (and only) pixels of that shape with a new value 15 . 
The algorithm has been slightly generalized to be suitable for the 
source detection in single-scale images by replacing color with 
another pixel property of having a non-zero intensity. 

The modified version of the Tint Fill algorithm looks, at each 
scale (7=1,2,..., Ns), for 4-connected areas of non-zero pixels 
in Jd/c an d assigns the value of the running source number i 
to each of those connected pixels, producing a segmentation im- 
age Jd/s- The latter consists of the segmentation masks of the 
sources found at scale j and previous smaller scales (getsources 
always analyzes single-scale images from small to large scales). 
A source mask is the area of 4-connected pixels (with values i) 
in Jd/s> a ll those pixels having non-zero intensity in Jd/c at j 
or at smaller scales. The masks uniquely identify the sources and 
they allow tracking them across all single scales (Fig. 9). 



14 Available at http://portal.acm.oig/citation.cfm?id=800249.807456 

15 Identification of distinct connected regions in similar algorithms is 
also known as connected-component labeling. 



In order to better disentangle and follow the evolution of 
blended sources in JdjC, the algorithm splits the images be- 
tween their maximum and mj by a number of intensity levels 
ajji with a spacing Slnojji = 0.1 or smaller. At each scale j, our 
source detection procedure works on a sequence of "partial" im- 
ages Jd/cz = max {Jd/c, Wji}, from top to the bottom, where 
the last partial image is the entire single-scale image Jd/c- For 
better efficiency, we only consider those levels ojji that increase 
the number of pixels in Jd/cz with respect to Jd./cz-i by at least 

6N™ = max |^(§) 2 . J • D 

where A is the pixel size and 6N£ an = 9. The levels coji with 
SNjij < SN™ n are skipped by the detection algorithm. Processing 
a partial image Ir>jci within scale j, getsources first gives the 
coordinates of the sources already found at previous scales and 
levels to the modified Tint Fill algorithm and the latter fills the 
evolved shapes of the "old" sources with their numbers i. Then 
it checks all remaining pixels of Jdj C/ to find new 4-connected 
areas of non-zero pixels that first appear at the current scale j; 
when found, the segmentation mask of each "new" source is 
filled with its new number i. 

When an isolated source vanishes at a certain scale, its pix- 
els are freed and may eventually become occupied by its neigh- 
bor or another (significantly larger) source that may appear there 
at a larger scale. One of two touching sources can also disap- 
pear from larger scales, when it becomes fainter at progressively 
larger scales and finally merges with the brighter neighbor (i.e., 
when the saddle point between the peaks disappears). When two 
or more isolated sources become connected in a single- scale 
image, their segmentation masks are not allowed to overlap. A 
boundary between two sources is maintained along the normal 
to the straight line connecting their centers (the normal being de- 
fined at a position along the line where intensity is at minimum); 
the sources can still grow at larger scales in the perpendicular 
direction. This boundary exists, however, only within a limited 
range of scales, until one of the touching sources vanishes or 
merges with another. The algorithm is perfectly able to handle 
hierarchical structures, as the segmentation masks can overlap 
at different scales for sources of significantly dissimilar dimen- 
sions. Whether a larger source containing smaller sources is de- 
tected as such or considered as the background, depends on the 
relative difference in the characteristic scales between the large 
and small sources, on the location of the small sources within the 
larger structure, and on the relative brightness of the peaks. If the 
source sizes are significantly different, the hierarchical structures 
are detected; a number of examples can easily be found in Fig. 9 
(upper-left), Fig. 17 (right), and Fig. 18 (bottom). 

One can show that a resolved isolated circular source i with 
the FWHM sizes A t = B t would have its maximum peak inten- 
sity lij in a single-scale image with a smoothing beam Sj «A;. 
Recall that the spatial decomposition (Eq. 1) is based on con- 
volution; the latter acts as a natural selector of scales in the de- 
composed images (cf. Sect. 2.2). Indeed, convolving with small 
beams (Sj<£A t ) would have almost no effect on the source, 
whereas using extended beams (Sj^>A t ) would greatly dilute 
the source. In both these extremes, sucessive unsharp masking 
produces decreasing peak intensities (Fig. 8) while creating the 
strongest signal for the sources with sizes A;«S/; completely 
unresolved sources are the brightest at spatial scales Sj ^ A . 

The scale 7V, where a source is the brightest, provides the 
best initial estimate for its actual FWHM size A t (= B t ) = Sj F . 
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Fig. 10. Topology of the pixel connections (Sect. 2.5). Pixels of the red 
shape are 4-connected and thus the shape may represent a source. Pixels 
of the dark blue shape are not 4-connected: there are no paths connect- 
ing any pair of the blue pixels such that any neighboring pixels in the 
paths are connected to each other only by their sides. Note, however, 
that there are two 4-connected sub-areas in the blue shape, which could 
represent sources. 
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This footprinting scale defines a source footprint, i.e. the entire 
area of the image pixels making non-negligible contribution to 
the total flux of the source. For unresolved sources, the foot- 
prints cover circular areas « nO\, whereas for well-resolved 
sources with intrinsically Gaussian intensity distributions, they 
cover elliptical areas « nAiBf. During the detection process, get- 
sources creates initial footprints (cf. Fig. 9, upper-left) with full 
sizes of A/f (= = 1.15 (2Sj F ) = 23 Sj F , where the empirical 
factor 1.15 was found to improve results in our tests. The foot- 
prints generally become elliptically- shaped in the measurement 
iterations (Fig. 15), reflecting the source shapes obtained from 
intensity moments (Sect. 2.6). 

In a multi-wavelength extraction, the combined images are 
sums of rescaled images over all individual wavebands (Eq. 8) 
using the weighting (Eq. 9) that effectively separates IxojC m 
the combined detection images I' D j C . Although the weighting 
biases Sj F towards higher-resolution images (makes the initial 
footprints smaller), this does not affect the results, as the sizes 
found in the detection process are replaced with the values ob- 
tained from the monochromatic images Ixojc during the mea- 
surement iterations (Sect. 2.6). 

Coordinates of a source are computed from the moments of 
intensities (Appendix F) in a limited range of scales, only up to 
a scale 4 times larger than the one it first appeared at or up to 
the footprinting scale j-p, whichever is smaller. This gives more 
accurate positions than if recomputed at even larger scales, since 
the latter tend to mix in more and more of the signals from the 
surroundings and thus distort the single-scale intensity distribu- 
tion of sources. To improve the positional accuracy even further, 
getsources uses only the pixels with values greater than Iy/IA, 
where is the peak intensity of a source i and the number in the 
denominator has been found empirically. 

In general, observed images contain structures at all scales 
and there is a real danger of creating spurious sources by mistak- 
enly detecting noise peaks or background fluctuations that hap- 
pen to lie on top of larger structures. Although the latter may 
be relatively faint, they tend to "amplify" the small-scale noise 
and background and make the small structures appear as sig- 
nificant sources. This can be especially problematic if the local 



Fig. 11. Measurement iterations (Sect. 2.6). The processing block of 
measurements from Fig. 1 is shown at the top. It is sub-divided here 
in 5 algorithmic steps that are executed in iterations, until the foot- 
print images T\ have converged, i.e. stable distributions of source foot- 
prints at each wavelength have been obtained. The deblending block 
itself includes iterations to disentangle contributions of many overlap- 
ping sources to the intensity of a pixel that belongs to all of them. 

uncertainties of the peak intensities (Sect. 2.6) are not possible 
to estimate due to crowding, as the fluctuations outside the large 
structures may be noticeably smaller and thus the signal-to-noise 
ratio (S/N) may easily be overestimated. 

There is a mechanism in getsources to greatly reduce this 
possibility. The idea is that small-scale peaks on top of larger 
structures tend to survive up to larger scales than they would do 
otherwise, without the "support" of the underlying structures. 
The latter tend to make the peaks eventually touch each other 
or merge at some spatial scale and this is used by the algorithm 
to identify and discard insignificant intensity peaks. Noise peaks 
on top of larger structures become diluted if considered relative 
to the intensity level contributed by the larger structure. 

When getsources analyzes the intensity levels coji of the par- 
tial images Td/cz> it finds the level where two or more sources 
first touch each other and computes the contrast Q = Iij/a)ji, 
where is the source maximum intensity in the image. The 
contrast of touching sources is checked to decide whether the 
sources should be treated as significant ones. The basis for this 
decision is the fact that when an insignificant source touches an- 
other source (either real or spurious), its contrast becomes quite 
low due to the underlying intensity of a larger structure. In prac- 
tice, getsources requires that any real source must have its con- 
trast Ci above the minimum value 

Cf^ n = J] max {Qa Qe, 1} , (12) 

where rj is the configuration parameter of getsources (with the 
default value of 1.35), Qa is the amplification factor, and Qe is 
the elongation factor: 

c '-H5(£)4 c, - m {s(!)4 <i3 » 
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Fig. 12. Background interpolation scheme (Sect. 2.6). The central red 
pixels belong to the source, defining its footprint in this example, 
whereas the surrounding blue pixels are those of the background. At 
each pixel of the source, the background is linearly interpolated in the 
four main directions based on the values of the pixels just outside the 
footprint (highlighted in darker blue), the resulting four values per pixel 
being averaged. Such interpolation probes the actual background varia- 
tions around the sources, and thus the interpolated background is more 
complex and realistic than just a planar surface. 



where Fn is the flux integrated over the source segmentation 
mask below the current intensity level coji (at which the source 
first touches another source) and is the flux integrated above 
that level; ai and b[ are the major and minor lengths of the source 
segmentation mask. These factors describe two aspects of the be- 
havior of noise peaks on top of larger structures: the amplifica- 
tion factor increases with a stronger contribution of the underly- 
ing structure, whereas the elongation factor becomes large when 
the structure has filamentary shape. For Qa Ge < 1, the source 
is considered real if its contrast C\ > r] (Eq. 12); larger values of 
the product Qa Qe raise the "barrier" higher. For Q < C mm the 
current source is flagged as spurious and its segmentation pixels 
are freed to be used by more prominent sources. If some or all of 
the touching peaks have contrasts above C™ m , they survive the 
test and their evolution is followed further to larger scales. 

Having detected all sources in the combined images Jd/c 
over all spatial scales of interest and collected the information 
in a detection catalog, getsources now returns to the actual ob- 
served images Ixo in each waveband for measurements of the 
basic properties of the sources, such as their peak intensity (flux), 
total (integrated) flux, and size. 

2.6. Measuring and cataloging properties of sources 

All measurements are performed in the observed images Ixo for 
known positions feyO of all sources, obtained in the detection 
process (Sect. 2.5) and not recomputed anymore 16 . The measure- 
ments must be done together with the background subtraction, 
deblending, and improvements of the footprints; these are non- 
trivial interconnected problems that require iterations (Fig. 11). 

To properly measure parameters of a source, one has first to 
determine and subtract the background. As discussed in Sect. 1, 
we define sources as significant intensity peaks detected by the 



16 Positions were derived using filtered detection images and recom- 
puting them from the observed images contaminated by irrelevant spa- 
tial scales would not make them more accurate. 



algorithm and whose entire contribution to the image is bound 
by their footprints. Based on this definition, getsources deter- 
mines the background by linearly interpolating pixel intensi- 
ties in Ixo under the source footprints (Figs. 12,15). The in- 
terpolation is done in the four main directions (two image axes 
and two diagonals), based on the pixels just outside the foot- 
prints, which do not belong to any source. For each pixel, values 
from the 4 directions are averaged to produce the background 
intensity at the pixel. This procedure results in the image of 
clean background Ixo cb and the background-subtracted image 
Xiobs = Ixo - Xiocb (Figs. 14,17). In very crowded areas of im- 
ages with many overlapping sources it is not possible to probe 
the background around each of the sources. In this case, the in- 
terpolation gives the best possible background estimate based 
on the nearest background pixels available around the blended 
areas 17 . 

Having created the background- subtracted images I^obs, the 
algorithm deblends values of pixels in overlapping sources and 
computes peak intensities F iA p, and total fluxes F iA T for each 
source i. At the first measurement iteration, it uses the initial size 
estimate Sj F obtained during the detection (Sect. 2.5), whereas in 
the subsequent iterations, the size and orientation from a previ- 
ous iteration are used. Our iterative algorithm employs deblend- 
ing shapes, the two-dimensional analogs of the function (Moffat 
1969) 



IiAM = Fup (l + (r/Ro) 2 ) 



(14) 



where r is the radial distance from the peak and Rq is a function 
of the FWHM shape (Aa, B iA , Qa) of a source i. The power-law 
exponent is fixed at f = 10 to have stronger, more realistic wings 
compared to the exponential wings of a Gaussian (£ — > oo) 18 . 
The deblending shapes (Eq. 14) are used to split the intensity 
^obs of a pixel between the source i and all overlapping sources 
according to a fraction of the profiles' intensities at that pixel: 



Z VkAM\ 

k 



UOBS» 



(15) 



where the summation is done over all sources whose footprints 
contain the pixel (Figs. 13, 14). The deblending iterations start 
with the original (blended) intensities Ixobs at the positions of 
each source and perform the splitting of the pixel values until 
the intensity Ia{xi,yd at the center of each source converges to 
its deblended peak intensity Fap- The deblended intensities la 
within the footprint ellipses (Fig. 15) are then used to integrate 
the total fluxes FaT, as well as their FWHM shapes (Aa, Ba, 
Qa) from the intensity moments (Appendix F). 

Local uncertainties of the peak intensities F a v are given by 
the standard deviations cr UP estimated in the observed images 
Ixo in an elliptical annulus defined around each source i just 
outside its footprint 19 . To ensure that the uncertainties are statis- 



17 This simple method works well (as long as one determines accurate 
footprints) and it is sufficient, as the background under sources is funda- 
mentally unknown. Estimates of the background based on more compli- 
cated approaches, such as its approximation by some two-dimensional 
functions, always involve assumptions and free parameters, and our 
simulations show that they may well be less accurate than the simple 
linear interpolation. 

18 As an example, the intensity of a circular I iAM at the footprint edge 
is by a factor of 1.565 higher than that of the corresponding Gaussian. 

19 This is equivalent to the standard approach of measuring flux er- 
rors for an isolated source. Heavily crowded fields present, however, a 
serious problem, as no source-free annulus exist around many of the 
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Fig. 13. Deblending overlapping sources (Sect. 2.6). Three identical sources (A, B, C) with the same sizes and with peak intensities normalized to 
unity are overlapping with their footprints. For clarity of the figure, the sources are not shown with their individual profiles, but rather with their 
blended intensity distribution SABC; with noise added, the profile transforms into EABCn. The deblending profiles A M , B M , and C M (Eq. 14) 
are defined at the source positions by the sizes and peak intensities estimated from the background- subtracted image J^obs- At each pixel, the 
deblending algorithm splits its intensity /sABCn between each of the overlapping sources, according to the fraction of the shape's intensity (Eq. 15). 
The resulting deblended profiles of each source, A D , B D , and C D , are shown in the right panel. 




Fig. 14. Deblending overlapping sources (Sect. 2.6). Background- subtracted overlapping sources at 350//m (left) are separated into two individual 
sources, cataloged under the numbers 244 and 242 (middle, right). This blended pair is clearly visible half-way north from the field centers in 
Figs. 2, 3, 7, 9, 17, 18; the annulus around the source 244 is highlighted in Fig. 16. Comparison with the known true model parameters shows that 
the peak intensities measured for these deblended sources are in error by -1.1% and -5.1% and the total fluxes were calculated with errors of 
-0.5% and -12%, respectively. The color coding is a linear function of intensity in MJy/sr. 



tically meaningful, the images of annuli Jix (Fig. 16) are con- 
structed by requiring that the area of any annulus must contain 
50 areas of the observational beam Ox- In the crowded fields, 
such as the one used for the illustrations in this paper, the foot- 
prints and annuli may overlap quite heavily (Figs. 15, 16) and 
therefore not all pixels can be used, as many of them belong to 
other sources. In such cases, getsources obtains an estimate of 
cr aP as local as possible by expanding the outer edge of the an- 
nulus outwards, until its usable area of non-zero pixels becomes 
50 A (Fig. 16, middle). 

Uncertainties cr ux of the total fluxes F iA T are estimated un- 
der the following assumptions. If a source footprint contains Nb 



sources situated within the regions. No relevant local values of the un- 
certainties can be found in that case, as more distant source-free areas of 
images are likely to have different properties in case of highly-variable 
background or noise. 



observational beams Ox, then the error of the sum of intensities 
over the footprint area will be the square root of the quadratic 
sum of the individual errors, since the beam measurements are 
statistically independent. Assuming the individual errors to be 
identical and equal to cr Z/lP , the total flux uncertainty is 

(AjRi B iF x) 1/2 

<r U T = <ru* 115(20a) • 06) 

where A^x and B iFA are the major and minor axes of the foot- 
print ellipse (cf. Eq. 20), Ox is assumed to be a circular Gaussian, 
and the empirical factor 1.15 has been introduced in Sect. 2.5. 

With the peak intensities and their uncertainties estimated 
for each source, one can define the standard S/N ratio = 
Fixp/o~ixp, to quantify how prominent the sources are against 
the noise and background fluctuations in their immediate envi- 
ronments. Conventional practice is to use the quantity for defin- 
ing reliability criteria to avoid contamination of the extraction 
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Fig. 15. Converged footprints of the measured sources (Sect. 2.6). The field of Fig. 2 is shown at 70, 100, 160, 250, 350, 500yt/m (left to right, 
top to bottom) as the footprints of all detected sources after the measurement iterations. The pixel values are the standard deviations cr aP , due 
to the local noise and background variations, estimated for each source in an elliptical annulus around its footprint (Fig. 16). Strongly elliptical 
or too large footprints may appear at those wavelengths where some sources are too faint to be measurable. For such sources, the information is 
essentially lost and the intensity moments cannot provide meaningful estimates of their sizes and orientation. The color coding is a function of the 
square root of intensity in MJy/sr. 
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Fig. 16. Annuli around measured sources (Sect. 2.6). The field of Fig. 2 is shown as the true intensities of the model sources at 350yt/m convolved 
to a 17" resolution (left), the image of annuli JKa of all detected sources (middle) slightly modified to highlight the annulus area around the source 
244 from Fig. 14, and the product Ma Iao (right) to visuaize the actual observed intensities used to compute the flux uncertainties cr aP shown in 
Fig. 15. The corresponding footprint images T\ are presented in Fig. 15 and the observed image Iao is displayed in Fig. 18. The color coding in 
the left panel is a function of the square root of intensity, in the other panels it is a linear function of intensity in MJy/sr. 
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Fig. 17. Background- subtracted sources (Sect. 2.6). The field of Fig. 2 is shown as the true intensities of the model sources at 350 yum convolved 
to a 17" resolution (left), the background- subtracted image J^obs at 350yt/m (middle), and the composite 3-color RGB image (500, 250, 160yt/m) 
created using the images of the deblending shapes of each extracted source (right). For the true model intensity distribution (no background) to be 
more comparable to the background- subtracted image, it is shown above 5 MJy/sr. The color coding is a function of the square root of intensity in 
MJy/sr. 



catalogs with spurious detections; at the same time, cr UP deter- 
mines the errors of the measured fluxes. However, in contrast to 
the traditional source extraction methods, getsources performs 
detection on highly- filtered images Ivjc that are quite different 
from the measurement images Ixo and it is important to make a 
clear distinction between the detection and measurement quali- 
ties. 

In the wavelength-dependent detection images Ixojc, we de- 
fine the contrasts Caj = Uxjl^xp similar to those introduced in 
Sect. 2.5. At the footprinting scale y'p the contrast is, within a fac- 
tor of nxj (Sect. 2.3), the monochromatic detection significance 

E^ = = n Aj C aj . (17) 

Although Ea is the single-scale analog of Q^, the quantity cr^ F 
evaluates the level of uncertainties at the footprinting scale in the 
single-scale detection images, whereas crap quantifies the level 
of fluctuations during measurements in the full observed images. 

Since getsources is a multi-wavelength extraction method 
and the detection is generally performed in the combined 
Extraction of an isolated source situated in approximately uni- 
form background and noise results in Ea~Qa. For sources in 
more complicated environments, however, Ea gives a cleaner 
and more accurate estimate of the detection quality and a bet- 
ter criterion for selecting real sources. The measured value of 
£la inversely depends on the fluctuation level of highly-variable 
backgrounds, such as those observed by Herschel in Galactic re- 
gions. The single-scale cleaning of I^Dj (Sect. 2.3) filters out all 
irrelevant spatial scales and thus substantially reduces the fluctu- 
ations outside sources. Therefore, the significance of detections 
by getsources may well be considerably higher than the conven- 
tional £la would indicate 20 . 

Since getsources is a multi-wavelength extraction method 
and the detection is generally performed in the combined im- 
ages Jdj C, it would be useful to define another quantity to mea- 
sure the significance of source detection more globally, across 

20 Simulations show that gives a notably more accurate represen- 
tation of the true model S/N and with a much lower dispersion than the 
conventional estimates do using the full images containing all spatial 
scales. 



all wavebands. To obtain a meaningful quantity, one cannot use 
the combined images, because they contain renormalized and 
summed up monochromatic images (Eq. 6). Assuming that we 
have a set of independent images I^d, it makes sense to define 
the global significance as 

We consider two levels of the robustness of source detection: 
the reliable level H re i and tentative level S ten (default values of 
7 and 5, respectively). Reliable sources, i.e. those with Ea > S re i 
in at least one waveband used for detection, are cataloged with- 
out checking whether they are detected at any other wavelength. 
Tentative sources, i.e. those with Ea < S re i in all wavebands used 
for detection, are kept in the final catalog only if Ea > H ten n~^ 2 
in at least n^ Qt wavebands (n^ Qt defaults to 1 and 2 for the 
monochromatic and multi- wavelength extractions, respectively). 
For any cataloged source, E t > S ten , and for reliable sources, 
E t > H re i. 

Having measured the properties of all detected sources at 
each wavelength, getsources completely removes those that are 
likely to be spurious. The algorithm treats removal of such 
sources with great care, dividing the measurement iterations into 
three phases. During the first phase, only non-detections are re- 
moved from the catalog, i.e. those sources that do not fulfill the 
above requirement of the simultaneous detection in at least n^t 
wavebands, all other detected sources are given a chance to con- 
verge. During the second phase, the algorithm removes sources 
with extremely low goodness (G/< 0.01) which we define as 

where Ri = E t Q Cr^ is the source reliability, Q/ is the global 
signal-to-noise ratio defined analogously to the global signifi- 
cance Sf (cf. Eq. 18), and Qej f is the elongation factor (Eq. 13) 
at the footprinting scale . Finally, during the third phase, 
sources are also removed when their position is too close to an- 
other source (within max {1.5, (O m i n /A)/3} pixels, where O m { n = 
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Fig. 18. Visualization of the measured and cataloged sources (Sect. 2.7). The field of Fig. 2 is shown at 70, 100, 160, 250, 350, 500 /urn (left to 
right, top to bottom) with the extraction ellipses (FWHM) of the measurable sources (F iA p > era? and F aT > cr ax ) overlaid on top of the observed 
images I A0 . The default condition, that a tentative source must be detected in at least two bands, was used. Only the protostellar cores are visible 
at 70 and lOOyum, whereas at 160yum cold starless cores starts to appear, becoming clearly visible in the SPIRE bands. For better visibility, the 
values displayed in the panels are somewhat limited in range; the color coding is a function of the square root of intensity in MJy/sr. 



min {Oa}) of almost the same size (within a factor of 2) and they 
have a lower value of E* than the other source. 

Since getsources is a multi-wavelength extraction method, 
globally good sources may well be detectable (E^ > E ten n~^ 2 ) 
or measurable (Q^ > 1) in only a limited number of wavebands. 
Indeed, it is quite usual that sources are either insignificant or 
non-measurable at some wavelengths and this generally leads to 
their footprints being very unreliable. In order to produce most 
accurate measurements, the footprints of such sources are re- 
moved from the corresponding footprint images T\ in that wave- 
band. This becomes very important in crowded regions with 
many overlapping sources. Since the quality of both the back- 
ground subtraction and deblending depends directly on the ac- 
curacy of the accumulated footprints, Ta must always remain 
as clean as possible, free of the footprints of insignificant or 
non-measurable sources. This wavelength-dependent removal is 
done starting from the second phase; however, the measurements 
of such sources are still kept in the catalog from the first phase 
of iterations. 

After the removal of bad sources, getsources analyzes the 
spatial distribution of all sources at each wavelength and flags 
them to provide some useful information on global properties of 
a source. The single-digit flag f is assigned to sources according 



to the following definitions. A source is called isolated (f = 0) if 
it is not blended with any other source at any wavelength. Two 
sources are called blended (fi = l) if the intersection area of their 
footprints is greater than 20% of either of the footprints in at 
least one waveband. A source is called sub-structured (f = 2) 
if a footprint of at least one smaller source is fully contained 
within the inner 56% of the footprint area (or within 0.75 A* fa 
and 0.75 5,- fa). A source is called sub -structuring (fi = 3) if it 
causes another source to be flagged as sub-structured. In addition 
to the global flag f, we also define the monochromatic flag fa 
that carries information on the wavelength-dependent properties 
for each source. Among other details, the flag identifies sources 
that are insignificant or non-measurable in a waveband. 

Further, getsources updates the extraction catalog, where 
each line contains the source number, coordinates, world coordi- 
nates (using the xy2sky utility, Mink 2002), global significance, 
flag, and goodness, followed by the measured properties at each 
wavelength: 

i Xi yi OLi St E i f Gi ( E a fix Far cr J/lP F^t o~ lit Ma ^ix ®a )n • 

The standard signal-to-noise ratio is not cataloged, because 
it can easily be derived from the catalog entries. The last pro- 
cessing block of each measurement iteration updates the accu- 
mulated footprints Ta of all sources, based on the latest values of 
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the source sizes and position angles. Full sizes and orientation 
of the footprint ellipses are computed from 

A iFA = 2.3 max [o A , S jpA } , B i¥A = A iFA , Q iFA = Q iA . (20) 

Having obtained the improved footprints, getsources checks 
whether the total number of non-zero pixels in <F A has notice- 
ably changed with respect to the previous iteration. If so, the 
measurement iterations continue until the total area of all foot- 
prints changes by less than 0.1%. Convergence properties of the 
measurement iterations vary for different fields; somewhere be- 
tween 5 and 30 iterations may be required to obtain a stable pat- 
tern of all footprints for each waveband (Fig. 15) and to produce 
the final catalog. In the last iteration, getsources creates an ad- 
ditional catalog of colors from all possible combinations of total 
fluxes F iA j over all A, a catalog of peak and background intensi- 
ties, as well as three versions of the azimuthally-averaged inten- 
sity profiles (I ao, Xiobs, Xiobsd) computed from the original, 
background- subtracted, and deblended images, respectively. 

2. 7. Visualizing extractions 

In order to facilitate visual analysis of the extraction results, get- 
sources produces a number of useful images for each waveband. 
These include (but are not limited to) the background- subtracted 
images T A d bs ? -Tiobs (Fig- 17), observed images I A q with the ex- 
traction ellipses overlaid on top (Fig. 18), and detection images 
Ixd (Fig- 19). Other images show just the central positions of the 
detected sources on top of the images I^dbs and I^obs; these 
are useful for visualizing crowded regions, where very little can 
actually be seen under the ellipses. Furthermore, the source im- 
ages -T^obsd display the background-subtracted, deblended in- 
tensity distribution for each individual source at each wavelength 
(Fig. 14). 

For an easy visualization of the various steps of the algo- 
rithm, getsources produces also a number of useful additional 
images. These include (but not limited to) the interpolated back- 
grounds Xidcb, Xiocb, the images of cr iAV (Fig. 15), the an- 
nuli Ji A (Fig. 16), the deblending shapes I A m (Fig. 17). The 
clean single-scale images Ijojc (Fig. 3), JdjC (Fig. 7), the ac- 
cumulated footprints Td (Fig. 9), and segmentation images Jd 7 s 
(Fig. 9) also remain available for an in-depth analysis and under- 
standing of the extraction process and results. 

2.8. Flattening background and noise 

Despite the fine spatial filtering employed by getsources, there 
is one common problem that still needs special treatment. It 
is known that the Herschel images of Galactic regions often 
show highly- variable backgrounds. The standard deviations of 
the combined background and noise fluctuations outside sources 
may differ by orders of magnitude between various areas of a 
large image I AD . Any simple thresholding method would have 
a difficulty in handling such images, as the global thresholds 
would not be good for all areas. Although the single- scale de- 
composition used in our method makes the images I AF) j much 
"flatter" and more suitable for use with global thresholding, it 
cannot overcome the problem completely, as the backgrounds 
fluctuate on all spatial scales, from the smallest to the largest 
ones. If the background or noise fluctuations strongly vary be- 
tween some areas of the image I AF> , they will still influence the 
intensity distributions of the single- scale images I^j, although 
to a much lesser degree. 



Prepare observed & detection 
images, observational masks 

INI Decompose •Clean •Combine* 
Detect • Measure • Catalog • Visualize 

Flatten detection images' noise 
and background fluctuations 



FIN Decompose • Clean • Combine • 
Detect • Measure • Catalog • Visualize 



Fig. 20. Flattening of the detection images (Sect. 2.8). The getsources 
algorithm requires two complete extractions, the initial and the final 
extractions (red blocks, expanded in Fig. 1; the preparation steps are 
shown in blue). 

Our method adopts a special two-step approach (Fig. 20) 
to overcome the problem; essentially, two complete extractions 
are performed instead of a single one. The initial deeper extrac- 
tion 21 aims at finding all possible candidate sources, in order to 
remove them and create the cleanest background images, free 
of any sources. The background images are then used to cre- 
ate the standard-deviation images D A and convolve them with a 
Gaussian beam Q^, producing the scaling (flattening) images 
I A y (see below for details). Dividing the detection images by 
the scaling images, we obtain the detection images I ADF that are 
flat, in the sense that the standard deviations in their background 
areas (outside of the sources) are approximately the same. The 
flattening procedure can be expressed as 

iiDF = Iad I AF l = Iad (Qaa * m~ l • (21) 

The second and final extraction is performed the same way and 
with the same parameters as the initial extraction 22 . The only dif- 
ference is that the detection images I AD are replaced with their 
flattened versions J^df and that, instead of the initial guesses for 
the maximum sizes of the sources, the actual maximum sizes 
^max derived m me initial extraction are used. In both extrac- 
tions, the measurements are performed on the same observed 
images I A0 . 

The flattening procedure is illustrated in Fig. 19. For refer- 
ence, the upper-left panel shows the original detection image I AF > 
at 350 jim of the simulated star-forming region used in this pa- 
per for illustrations of the method; the entire l°xl° is shown 
here in order to clearly see the flattening effect. The simulated 
background in this field has a temperature gradient along one 
diagonal of the image, clearly visible in the images. The dust 
temperature 7^ linearly varies from 20 K in the upper-left corner 
down to 15 K in the lower-right corner, where the background 
appears much brighter at 350 /mi. The footprint image shown 
in the upper-middle panel is the image T A (Fig. 15) somewhat 
expanded by convolution, to make sure that the resulting clean 
background has no residual artifacts that would influence the 
quality of the final flattening image. The image is used to re- 
move all sources from the detection image with our background 
interpolation scheme (Sect. 2.6). 

21 The depth is automatically adjusted by lowering 4 configuration 
parameters to the following values: S rel - 4, S ten - 3, n det = 1, n = 1.1. 

22 The three parameters automatically lowered in the initial extraction 
are now set to their default values: S re i = 7, S ten = 5, n det = 2, n = 1.35. 
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Fig. 19. Flattening of the detection images (Sect. 2.8). The entire l°xl° simulated star-forming region at 350//m is shown (upper left), the central 
area of which was used to illustrate getsources in this paper. The image of the converged footprints Ta somewhat expanded by convolution 
(upper middle) is used to interpolate all detected sources off the image to obtain the clean background image (upper right). The image T>x of 
the standard deviations (lower left) is computed in a very small (3x3 pixels) sliding window and further median-filtered using a 21x21 pixels 
sliding window to reduce effects of possible artifacts. The smoothed scaling image I AF (lower middle) is obtained by convolving the image in 
the previous panel with a circular Gaussian beam Qxa larger than the maximum size of sources extracted at 350 fim in the initial extraction. The 
resulting flattened image Xi DF (lower right) is obtained by dividing the original image I AD by the scaling image For better visibility, the values 
displayed in the panels are somewhat limited in range; the color coding is a linear function of intensity in MJy/sr. 



The interpolated background in Fig. 19 is smooth and clean, 
except for a couple of artifacts at its lower edge. The images 
that are bright and variable at their edges may lead to such edge 
artifacts, because getsources uses convolution and interpolation. 
Although the images are sufficiently expanded (extrapolated) by 
the algorithm before convolution, they remain essentially un- 
known beyond their edges. This type of artifacts never happens 
with the entire images from real- world observations that produce 
very low intensities at their edges, due to the baseline subtrac- 
tion. Such problems may only exist in simulated images or in 
sub-fields that have been cut out of full larger images 23 . 

From the background image getsources creates the image 
of standard deviations T>x, computed in a very small (3x3 pix- 
els) sliding window. The aim here is not to produce statistically- 
meaningful values, but to ensure that the features of T>x remain 
as local as possible; much larger windows would smooth out 
the values, which may not be suitable for the original highest- 



23 A natural remedy is to define the sub-fields that are larger than the 
area one is interested in studying and make sure that the intensities at 
the edges of the extraction area are relatively low. 



resolution images. The image T>\ is further median-filtered using 
a 21x21 pixels sliding window to reduce the effects of possible 
residuals or artifacts. The same image is used again to in- 
terpolate the median-filtered pixel values under the footprints, 
resulting in the image shown in the lower-left panel of Fig. 19. 
The scaling (flattening) image I^v is obtained by convolving the 
filtered T>x image with a circular Gaussian beam Qxa of a size 
3 times the maximum size of sources A™ ax measured in the ini- 
tial extraction; the smoothing ensures that the flattening does not 
distort the intensity distribution of even the largest sources. The 
scaling image I$p resembles fairly well the original tempera- 
ture gradient that was introduced in the simulated images along 
their diagonal. The resulting flattened image Xidf is the origi- 
nal image Ixd divided by Ixv- The large-scale background vari- 
ations clearly visible in the original detection image I A d have 
been mostly removed from the flattened image Xidf, making the 
latter suitable for the global single- scale thresholding applied by 
getsources in the final extraction. 
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3. Applications to Herschel images 

In Sect. 2, our multi-wavelength source extraction method was 
described and illustrated using the images of a simulated star- 
forming region. The simulation is one of our suite of bench- 
marks designed to aid in the development of getsources and in 
accurate tests of its performance in various conditions against 
the model images with fully known properties of the sources, 
background, and noise (the benchmarks will be described else- 
where; Men'shchikov et al., in prep.). In addition to the purely 
synthetic skies, getsources has been successfully tested on the 
ground-based millimeter continuum survey of the Aquila Rift 
complex (Maury et al. 2011), where all extracted sources have 
been carefully verified by eye inspection and their parameters 
evaluated manually. 

We present here two real-life examples of getsources extrac- 
tions for our Gould Belt and HOBYS surveys of the nearby and 
more distant star-forming regions with Herschel. For this pur- 
pose, we defined sub-fields of the observed images of Aquila 
and Rosette 24 , small enough to enable readers to verify the ex- 
traction results with their own eyes. We emphasize that the only 
goal of this presentation is to help the readers clarify various 
aspects of this new source extraction method; any astrophysical 
analysis of the fields is beyond the scope of this paper. 

3.1. A cluster of resolved prestellar cores in Aquila 

The observations, data reduction, and first results for the Aquila 
star-forming region (part of the Herschel Gould Belt survey, 
adopted distance Z) = 260pc) have been described by Andre 
et al. (2010); Men'shchikov et al. (2010); Konyves et al. (2010); 
Bontemps et al. (2010). The Aquila sub-field (365x365 3'.'0 pix- 
els), shows a cluster of cold prestellar cores, clearly visible in 
all SPIRE wavebands south-east of the bright W40 region in the 
Aquila field. The "cold cluster" was chosen to illustrate the per- 
formance of our new method for studying the earliest phases of 
low-mass star formation in the nearby regions (Figs. 21, 22). 

The lower-right panel of Fig. 21 shows a 3-color compos- 
ite image of the extracted sources in the Aquila sub-field, rep- 
resented by their elliptical deblending shapes Ixm (Eq. 14) with 
the measured peak intensities, sizes, and orientations. The image 
gives an overview of the source properties combining the infor- 
mation from the 500, 250, and 160yum bands in a single image. 
With just a few exceptions, all sources are red, yellow-red, and 
green, indicating that the radiation is emitted by cold starless 
objects, without any significant central energy source. Only 8 
protostars (blue, white-red, and white-pink sources) are visible 
in the composite and PACS images, while as many as 39 cold 
prestellar cores become detectable and measurable in the SPIRE 
wavebands, as can be seen in the lower panels of Fig. 21. 

The dense cold prestellar cores are clearly situated within 
a complex web of filamentary structures at significant intensity 
peaks; due to their nature, the cores must also coincide with 
significant column density peaks. The left panel of Fig. 22 dis- 
plays a column density image of the cold cluster, demonstrating 
that most extracted sources are indeed centered on the column 
density peaks. A couple of compact sources appear to be off- 
set from the column density peaks and intensity peaks at SPIRE 
wavelengths. These are the protostars prominent in the PACS 
bands that happened to either coincide (in projection) with those 
locations or form off-center in the dense clumps. We show in 



24 The respective sub-fields are similar to the areas displayed in Fig. 5 
of Konyves et al. (2010) and in Fig. 1 of Hennemann et al. (2010). 



Fig. 22 the ellipses of all 46 detected sources; there are 7 addi- 
tional sources that are not measurable at some wavelengths and 
therefore not visualized by ellipses in Fig. 21. The middle panel 
of Fig. 22 shows the same set of extraction ellipses on top of 
the combined detection image (at scales Sj < 80") that clarifies 
why getsources found the sources at all those locations. In ad- 
dition, the right panel shows the clean background image Iaocb 
at 500 jim that demonstrates that no significant sources in the 
Aquila sub-field were left undetected by getsources. 

These results (visualized in Figs. 21, 22) demonstrate that 
getsources handles very well the multi- wavelength Herschel ob- 
servations of resolved starless cores, the main ingredient of the 
Aquila sub-field. Although all unresolved protostars were also 
extracted, it is the next example that focuses on protostellar pop- 
ulation. 

3.2. Clustered unresolved protostars in Rosette 

The observations, data reduction, and first results for the Rosette 
star-forming region (part of the HOBYS survey, adopted dis- 
tance D= 1.6 kpc) have been described by Motte et al. (2010); 
Schneider et al. (2010); Hennemann et al. (2010); di Francesco 
et al. (2010). The Rosette sub-field (395x395 l'/4 pixels), shows 
a central part of the Rosette field with an extended bright area at 
70 yum and a number of unresolved isolated and clustered proto- 
stars in the PACS wavebands. This sub-field of Rosette was cho- 
sen to illustrate the performance of getsources for studying faint 
unresolved protostars in distant star-forming regions (Figs. 23, 
24). 

Similarly to Fig. 21, the lower-right panel of Fig. 23 shows a 
3-color composite image of the extracted sources in the Rosette 
sub-field, represented by their elliptical deblending shapes with 
the measured peak intensities, sizes, and orientations. Note, 
however, that the color image is strongly affected by the differ- 
ence in spatial resolutions in the wavebands (by factors of ~3 
and 2) than the color image shown in Fig. 21, as most of the 
sources in the distant Rosette sub-field are unresolved even at 
70 yum. Most of the sources remain in all of the Herschel images 
displayed in the other panels of Fig. 23; they are deblended and 
remain measurable all the way to the lowest resolution of the 
500 yum band. Figures 23, 24 also highlight severe problems en- 
countered by the usual "monochromatic" algorithms when they 
extract sources independently at individual wavelengths and then 
associate sources based on their positions in the images with 
such greatly varying resolutions. 

The redder (colder) sources are mostly found in the lower 
half of the image, whereas the bluer sources are scattered over 
the Rosette sub-field. Here we will focus on the compact unre- 
solved sources visible quite clearly in the 70 yum waveband in rel- 
atively low-background conditions, which can be used to judge 
whether getsources is able to separate faint peaks that are close 
to each other. One can count 35 such sources that are mostly 
clustered in groups of two, three, or four; we will limit our dis- 
cussion to the six groups labeled A-F in Fig. 23. Aiming to clar- 
ify why getsources found the sources at those locations, the left 
panel of Fig. 24 shows the extraction ellipses of all sources on 
top of the combined detection image (at scales Sj < 12"). The 
middle panel of Fig. 24 displays the background- subtracted im- 
age giving the cleanest view of the sources and its right panel 
shows the clean background image Ixo cb at 70 yum demonstrat- 
ing, that no significant sources in the Rosette sub-field were left 
undetected by getsources. 

Group A consists of 4 protostars of different brightness but 
very similar separations between the neighbors, clearly distinct 
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Fig. 21. The Aquila sub-field (18'xl8') is shown as the observed images I A0 at 70, 160, 250, 350, 500yt/m (left to right, top to bottom) with the 
extraction ellipses (FWHM) of only measurable sources (F iAP > cr ap and F iAT > cr aT ) overlaid, as well as the composite 3-color RGB image 
(500,250, 160yt/m) created using the images I AM of the deblending shapes of each extracted source (lower-right). The default condition, that a 
tentative source must be detected in at least two bands, was used. Only the protostars are visible at lOfim, whereas at 160yum one starless core 
appears, the other cold sources becoming clearly visible in the SPIRE bands. For better visibility, the values displayed in the panels are somewhat 
limited in range; the color coding in the lower-right panel is linear, in all other panels it is a function of the square root of intensity in MJy/sr. 
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Fig. 22. The Aquila sub-field is shown as a column density image with a 37" resolution (left, Konyves et al. 2010), an accumulated clean combined 
detection image containing spatial scales of up to 80" (middle), obtained by summing up the single scales JdjC in that range, with the 500 //m 
ellipses of all detected sources overplotted. Also shown is the clean background I^ocb (right) at 500 iim. For better visibility, the values displayed 
in the panels are somewhat limited in range; the color coding in all panels is a function of the square root of column density and of intensity in 
MJy/sr. 
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Fig. 23. The Rosette sub-field (9'x9') is shown as as the observed images I X o at 70, 160, 250, 350, 500yt/m {left to right, top to bottom) with the 
extraction ellipses (FWHM) of only measurable sources {F iAP > cr ap and F iAT > cr aT ) overlaid, as well as the composite 3-color RGB image 
(500, 250, 70yt/m) created using the images I AM of the deblending shapes of each extracted source {lower-right). The default condition, that a 
tentatve source must be detected in at least two bands, was used. Most of the compact sources visible at 70//m are unresolved protostars; several 
groups of them, discussed in Sect. 3.2, are labeled A-F. For better visibility, the values displayed in the panels are somewhat limited in range; the 
color coding in the lower-right panel is linear, in the other panels it is a function of the square root of intensity in MJy/sr. 
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Fig. 24. The Rosette sub-field is shown as the accumulated clean combined detection image containing spatial scales of up to 12" {left), obtained 
by summing up the single scales Jd/c in that range, with the 160//m ellipses of all detected sources overplotted. Also shown are the background- 
subtracted image J^obs {middle) and clean background Iaocb, {right) at 70 iim\ when added together, they make up the original 70 iim image in 
Fig. 23. For better visibility, the values displayed in the panels are somewhat limited in range; the color coding is a function of the square root of 
intensity in MJy/sr. 
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at 70yum and extracted by getsources across all wavelengths. 
Group B presents a more difficult case of 4 sources with decreas- 
ing brightess and distance between the members; the faintest 
protostar on top of group B is almost merged with its neighbor, 
but still has been detected and measured in 4 bands. A similar 
case is displayed by group C, where one faint source is situated 
between two other brighter sources; the faintest source is mea- 
surable only in the highest-resolution image at 70 jim. In group 
D we have another similar cluster of 4 sources with an extremely 
faint source surrounded by the brighter ones; most members of 
group D are measurable across all wavelengths. Group E is cor- 
rectly extracted as two close companions, with only the brighter 
source being measurable in all wavebands. Group F is two rela- 
tively bright protostars at a very small separation. Although they 
are practically merged together, with a saddle point just a few 
percent below the peaks, the binary is detected as such and both 
components are measurable up to 250 /mi. 

The results presented in Figs. 23, 24 show that getsources 
handles very well the multi-wavelength Herschel observations 
of unresolved protostars, the main ingredient of the sub-field. 
It fully preserves the highest-resolution information from the 
70yum waveband and uses it to correctly identify and measure 
close companions in all groups of protostars at all wavelengths. 

4. Conclusions 

The multi-scale, multi- wavelength source extraction method get- 
sources presented in this paper, was designed primarily for use 
in large far-infrared and submillimeter surveys of star-forming 
regions with Herschel. Instead of following the traditional ap- 
proaches of extracting sources directly in the observed images 
(Sect. 1.1), the method analyzes highly- filtered decompositions 
of original images over a wide range of spatial scales (Sect. 2.2). 
The algorithm separates the peaks of real sources from those 
produced by the noise and background fluctuations (Sect. 2.3) 
and constructs wavelength-independent sets of combined single- 
scale detection images (Sect. 2.4) preserving spatial information 
from all wavebands. Sources are detected in the combined de- 
tection images by tracking the evolution of their segmentation 
masks across all spatial scales (Sect. 2.5). Source properties are 
measured in the original background- subtracted and deblended 
images at each wavelength in iterations (Sect. 2.6). Additional 
catalogs and images are produced to aid in the analysis of the 
extraction results (Sect. 2.7), complementing the main catalog 
of sources. Based on the results of the initial extraction, detec- 
tion images are flattened to produce much more uniform noise 
and background fluctuations in preparation for the second, final 
extraction (Sect. 2.8). The performance of the new method on 
Herschel images was illustrated by extracting sources in small 
sub-fields of the Aquila and Rosette regions (Sect. 3). 

There are several significant advantages of getsources over 
other existing methods of source extraction. (1) The fine spa- 
tial decomposition filters out irrelevant spatial scales and im- 
proves detectability, especially in the crowded regions and for 
extended sources. (2) The multi- wavelength design enables com- 
bining data over all wavebands, eliminating the need to match 
independent extraction catalogs and enabling substantial super- 
resolution in the images with lower spatial resolution. (3) The 
single-scale detection algorithm identifies sources and deter- 
mines their characteristic sizes, avoiding spurious peaks on top 
of large structures and filaments. (4) The background subtrac- 
tion and deblending, based on the wavelength-dependent foot- 
print of each source, disentangle crowded regions with overlap- 
ping sources. (5) The extraction process is fully automatic and 



there are no free parameters involved: the default configuration 
works best in all cases that have been tested. 

A disadvantage of the algorithm is that it may not be very 
fast and it may require considerable storage space, depending 
on the numbers of pixels, spatial scales, wavelengths, iterations, 
and potential sources detected (Appendix G); most of the space 
can be freed, however, after the extraction has been completed. 

The method has been thoroughly tested using many simu- 
lated benchmark images and real-life observations. In particu- 
lar, the overall benchmarking results (Men'shchikov et al., in 
prep.) have shown that getsources comes on top of the other 
source extraction methods that we have tested (Sect. 1.1) in both 
the completeness and reliability of source detection and the ac- 
curacy of measurements. The source extraction code is auto- 
mated, very flexible, and easy-to-use; the code and validation 
images with a reference extraction catalog are freely available 
(see Appendix G). 
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Appendix A: Astrophysical objects: dense cores 

The primary goal of this work is to develop a source extraction 
method suitable for the systematic detection and measurement of 
dense cores in molecular clouds with Herschel: one of the main 
observational objectives of the Gould Belt survey (Andre et al. 
2010) is to obtain a complete census of such prestellar cores in 
nearby molecular clouds. 

The structure of molecular clouds is often filamentary (e.g., 
Schneider & Elmegreen 1979); it is also known to be highly 
hierarchical and self- similar over a wide range of scales (e.g., 
Falgarone et al. 1991). This structure can be attributed to the role 
of supersonic interstellar turbulence (e.g., Larson 1981) and is 
reasonably well described by fractal models (e.g., Elmegreen & 
Falgarone 1996). However, interstellar turbulence dissipates on 
small scales; coupled to the effects of gravity in gravitationally- 
bound clouds, this breaks the self- similarity of cloud structure 
on scales below ~ 0.1 pc (e.g., Williams et al. 2000). The lat- 
ter is the typical scale below which prestellar cores, the self- 
gravitating condensations of gas and dust giving birth to indi- 
vidual stars or systems, are observed in molecular clouds (e.g., 
Motte et al. 1998, 2001; Andre et al. 2000). Prestellar cores 
are observed at the bottom of the hierarchy of interstellar cloud 
structures and depart from Larson (1981)'s self- similar scal- 
ing relations. They correspond to "coherent" regions of nearly 
constant and thermal velocity dispersion which do not obey 
Larson's power-law linewidth vs. size relation (e.g., Myers 1983; 
Goodman et al. 1998). The 18" angular resolution of Herschel at 
250 yum, equivalent to 0.03 pc at a distance of 350 pc, is sufficient 
to resolve the typical Jeans length in the nearby clouds; this is 
also the characteristic diameter expected for Bonnor-Ebert-like 
cores. 

To first order, known prestellar cores have simple, convex, 
not very elongated shapes, and their density structure approaches 



24 



Men'shchikov et al.: A multi-scale, multi- wavelength source extraction method 



that of the Bonnor-Ebert isothermal spheroids bound by the ex- 
ternal pressure exerted by the parent cloud (e.g., Johnstone et al. 
2000; Alves et al. 2001). Conceptually, a dense core may be de- 
fined as the immediate vicinity of a local minimum in the gravi- 
tational potential of a molecular cloud, corresponding to the part 
of the cloud under a given local gravitational influence. While, 
in general, the gravitational potential cannot be inferred from 
observations, it turns out to be directly related to the observable 
column density distribution for the post-shock, filamentary cloud 
layers produced by supersonic turbulence in numerical simula- 
tions of cloud evolution (Gong & Ostriker 2011). In practical 
terms, this means that one can define a dense core (more pre- 
cisely, its projection onto the plane of the sky) as the immediate 
vicinity of a local maximum in observed column density maps, 
such as those derived from Herschel imaging, where dust con- 
tinuum emission is largely optically thin and directly traces col- 
umn densities. The source extraction method presented in this 
paper offers a new approach to the detection and measurements 
of sources, making full use of the multi-scale, multi-wavelength 
nature of the source extraction problem in the case of Herschel 
data. Analyzing a wide range of spatial scales, our method is 
able to detect the hierarchical structures of molecular clouds (cf . 
Sect. 2.5). Unlike such techniques as the dendrogram analysis 
(Rosolowsky et al. 2008), however, getsources is not explicitly 
designed to characterize the hierarchy of structures; our main 
focus is on the "compact" sources, at the end of the hierarchy. 

Appendix B: List of symbols 

For convenience of the readers, we list and define all symbols 
introduced in Sect. 2 of this paper: 

thx images of the annuli around all detected sources 

T>x images of local standard deviations for flattening 

Th image of the initial footprints after source detection 

Tx images of source footprints in measurement iterations 

T\ images of source footprints expanded by convolution 

Qj smoothing Gaussians in successive unsharp masking 

Qx smoothing Gaussians used to create detection images 

Qxa smoothing Gaussians used in the flattening procedure 

J Dj c clean detection images combined over wavelengths 

?DjCi partial detection images above the sub-level ojji 

I' D j C clean detection images combined over wavelengths 

Jd;s single-scale segmentation images of source masks 

XvtOBSD background- subtracted, deblended images of sources 

Ix original observed images produced by a map-maker 

XiDF flattened detection images for the final extraction 

I XT> detection images: either Ixo or transformed Ixo 

IxDj single- scale decompositions of the images Ixd 

?ADjc single-scale images cleaned of noise and background 

Xidc full clean images reconstructed from IxDjc 

Xidj r single-scale images of the cleaning residuals 

Xidr full residuals images reconstructed from IxDjR 

I xv scaling image smoothed by convolution 

Ixm images of the deblending shapes of all sources 

Ixo measurement images: I A resampled to pixel A 

Xiobs background- subtracted images of all detected sources 

Xiocb clean-background images with all sources removed 

Mbc source mask images accumulated over scales and X 

Mx observational mask images defining areas of interest 

Txj threshold images with all pixels set to TUxj 

at major length of a source segmentation mask at ojji 

A[ major FWHM size of a source i 

A/f major axis of the footprint ellipse of a source / 



Aiyx major axis of the footprint ellipse of a source / at X 

Aix major FWHM size of a source i measured at X 

^max maximum FWHM sizes of sources to be extracted 

bi minor length of a source segmentation mask at ojji 

B t minor FWHM size of a source i 

minor axis of the footprint ellipse of a source i 

BiYx minor axis of the footprint ellipse of a source / at X 

Bix minor FWHM size of a source i measured at X 

Ci contrast of a source i above the sub-level coji 

£ mm required minimum contrast C; for real sources 

C/a amplification factor in the detection of noise peaks 

C,-E elongation factor in the detection of noise peaks 

CiEj F elongation factor at the footprinting scale y'p 

Cixj contrast of a source i above the threshold tuxj 

fi global flag: information on source global properties 

fix monochromatic flag: information on source at each X 

fs scale factor defining relative spacing between scales 

fxj turn-on factor for combining scales when Sj < Ox 

F/hi flux integrated over the source mask above ojji 

Fi\ flux integrated over the source mask below coji 

F a p background- subtracted and deblended peak intensity 

FiXT background- subtracted and deblended total flux 

Gi goodness of an extracted source / 

/ running number of a source in the extraction catalog 

Iij peak intensity of a source / at scale j 

IiXj peak intensity of a source i in the image IxDjc 

IiXj F peak intensity of a source i in IxDjc at scale yp 

Ixj pixel intensity in a single-scale detection image 

IJaxjc maximum intensity over the clean image IxDjc 

j running number of a decomposed spatial scale 

j F number of the footprinting scale of a source 

kxj kurtosis in the single-scale residuals IxDjR 

£max maximum allowed value of kxj during cleaning 

/ running number of the intensity sub-level coji 

ftdet minimum number of /Ts a source must be detected at 

tixj variable number of standard deviations cr A j in rnxj 

N number of wavelengths used in the source extraction 

TVs number of spatial scales in the image decomposition 

N U j number of pixels in a partial detection image IvjC i 

N™ n minimum value of Nuj for detecting sources in lujci 

Nua number of pixels in a cluster of connected pixels 

N™ n minimum value of Nua in IxDjc for making lujc 

Ox observational angular resolution: FWHM beam size 

O observational beam size averaged over wavelengths 

Ri reliability of an extracted source i 

sxj skewness in the single-scale residuals IxDjR 

s max maximum allowed value of sxj during cleaning 

Sj spatial scale: FWHM of a smoothing Gaussian beam 

Sj F characteristic footprinting scale of a source in Ivjc 

Sj F x characteristic footprinting scale of a source in IxDjc 

Smax largest spatial scale in a single-scale decomposition 

Wx weight enhancing contribution of high-res. images 

Xi, y t source coordinates obtained in the detection process 

or;, Si source coordinates in the equatorial system 

7] parameter in the detection of noise peaks 

y weighting power-law exponent defining w A 

A pixel size (the same for all images in an extraction) 

X wavelength (central wavelength of a waveband) 

fiixj third statistical moments about the mean value 

fi4Xj fourth statistical moments about the mean value 

TUxj iterated cleaning thresholds (cut-off levels) 

o~ xj standard deviation in a single- scale image 

o~ x standard deviation in a full image 
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cr A j F standard deviation in a detection image at scale j'f 

crap uncertainty of the peak intensity Fap of a source 

craj uncertainty of the total flux F a j of a source 

II/, n m pixels /, m in the clean combined images T^jc 

®iF/i position angle of the elongation of a footprint 

®a position angle of the elongation of a source 

Hf global significance over all wavelengths A 

Ea monochromatic significance of a source i at X 

H re i reliable level of signficance for extracted sources 

H ten tentative level of signficance for extracted sources 

Qi global signal-to-noise ratio over all wavelengths 

£la monochromatic signal-to-noise ratio of a source / 

cjji sub-levels of intensities during source detection 

Appendix C: Simulated star-forming region 

To illustrate the spatial decomposition and all other processing 
steps of getsources in this paper, we use a simulated star-forming 
region that we constructed well before the launch of Herschel in 
order to have a reasonably realistic model; it is sufficient to give 
here the following brief description 25 . 

The simulated star- forming region, placed at 140 pc, consists 
of a synthetic (scale-free) cirrus background fitted to a typical 
lOOyum intensity of the backgrounds in the Gould Belt survey 
and scaled to all other Herschel wavebands assuming a black- 
body with a dust temperature of 17.5 K (cf. Lagache et al. 1999). 
The background was populated with 360 starless cores and 107 
protostars with realistic intensity distributions, obtained from 
a large grid of 129 one-dimensional dust continuum radiative 
transfer models. The density distribution of the cores followed 
the structure of critical Bonnor-Ebert spheres, whereas the proto- 
stars had p oc r~ 2 density profiles; both types of objects were em- 
bedded in background spherical clouds with a uniform density. 
The standard isotropic interstellar radiation field was shining on 
the outer edges of the clouds, making the temperature profile in- 
verted (lower in the center) in all objects. Protostars, however, re- 
stored usual temperature distributions deeper towards their cen- 
tral parts, as they produced an accretion luminosity (L^ocM); 
they had accreted half of the mass of the cores they formed from 
(the conceptual borderline between Class and Class I objects, 
Andre et al. 2000). Starless cores consisted of low-, medium-, 
and high-density sub-populations and were distributed accord- 
ing to the Mbe ^be relation for the isothermal Bonnor-Ebert 
spheres (T BE = 7, 14, 28 K) in the area of the mass-radius di- 
agram occupied by prestellar cores observed in the Orion and 
Ophiuchus star-forming regions (Motte et al. 1998, 2001). All 
populations span wide ranges of masses (0.01-10 M©) and radii 
(0.001-0.1 pc). Random noise (at the expected levels of the in- 
strumental noise) was added to all pixels of the simulated images 
and the latter were convolved to the expected observational reso- 
lutions of 5, 7, 1 1, 17, 24, and 35" in all PACS and SPIRE bands 
at 70, 100, 160, 250, 350, and 500 /mi. 

Appendix D: A look in the Fourier domain 

The original images and their single-scale decompositions can 
be transformed into the Fourier domain. The successive unsharp 
masking described by Eq. 1 (Sect. 2.2) can also be formulated 
in terms of the Fourier transforms of the images. For those read- 
ers who are used to the tranformations between the image and 

25 Full description of this and other synthetic skies will be given in 
another paper (Men'shchikov et al., in prep.) devoted to benchmarking 
of source extraction algorithms. 



Fourier domains, we present some additional images that may 
be useful for better understanding of getsources. 

In Fig. D.l, we show the amplitudes of the complex Fourier 
transform for the simulated star-forming region that is used in 
this paper to illustrate getsources (cf. Appendix C). In the simu- 
lated images, one exactly knows all individual components: the 
model sources, background, and noise; the amplitudes of the 
three components are displayed in the top panels of Fig. D.l. 
The Fourier transforms of the sources and noise are clearly 
dominated by the fact that the original images have a resolu- 
tion of the 350 /mi Herschel band: higher spatial frequencies 
have been suppressed by the convolution with the observational 
beam. Although the transform of the synthetic background is 
also shaped by the limited angular resolution at high frequencies, 
the steep power spectrum P(q) oc q~ 3 of the synthetic background 
carries most of its power on large scales (cf. Appendix E) and 
hence the amplitude remains strongly peaked at zero frequency. 
The full simulated image has all components added together and 
at each spatial frequency the Fourier amplitude becomes a mix- 
ture of the noise, background, and sources that cannot be cleanly 
separated anymore (Fig. D.l, middle-left panel). 

Our source extraction method attempts to give a practical so- 
lution to the problem of separating sources from all other com- 
ponents by decomposing the original images in a large num- 
ber of fine spatial scales using a procedure that involves suc- 
cessive convolution and subtraction of the images (cf. Sect. 2.2, 
Eq. 1). In effect, such a decomposition creates a set of the filtered 
"single-scale" images I^Dj each containing considerable sig- 
nals from only a limited range of spatial scales (or frequencies) 
that are determined by the size Sj of the smoothing Gaussian 
beam Qj. Consequently, in the Fourier domain, the single-scale 
images look like toroidal structures of variable widths heavily 
overlapping with each other over a substantial range of frequen- 
cies (Fig. D.l). By construction, the simulated sky does not in- 
clude any asymmetric or highly-elongated (filamentary) struc- 
tures, therefore the toroids look very regular and symmetric 
in the Fourier domain. The actual fields observed by Herschel 
are, however, more complicated, containing lots of filamentary 
structures, that make the toroids less symmetric. To illustrate 
their appearance in the real observations, we present in Fig. D.2 
Fourier amplitudes of a few single- scale images for the Aquila 
and Rosette star- forming regions. 

The Fourier space representation is useful for understanding 
some aspects of our method, as well as for image convolution 
(for which we apply a discrete fast Fourier transform algorithm). 
However, getsources is not entirely translatable into the Fourier 
domain and remains fundamentally the image-oriented method 
of source extraction. 

Appendix E: Power spectra of image components 

The ability to detect compact sources, such as dense cores, in 
the wide-field images obtained as part of the Gould Belt and 
HOBYS surveys with Herschel is primarily limited by the con- 
fusion arising from the small-scale cloud structure. In contrast, 
the instrumental noise and mapping artifacts are often negligible. 
An important property of the background cloud fluctuations is 
that they do not follow Gaussian statistics. In particular, it is well 
known that the power spectrum of interstellar cloud fluctuations 
strongly depends on spatial scales (P(q) oc q~ 3 , where q is spatial 
frequency; e.g., Roy et al. 2010, and references therein). This 
is illustrated in Fig. E.l, which shows the power spectrum of a 
SPIRE 250 yum image of the Polaris cirrus cloud (see also Fig. 3 
of Miville-Deschenes et al. 2010). In practice, this means that the 
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Fig.D.l. Fourier transform of the simulated star-forming region used in this paper (Appendix C) at 350 //m with a 24" resolution (Fig. 19, upper- 
left). The top panels show the Fourier amplitudes of separate components: the model sources, synthetic background, and noise {top left to right). 
The full image containing all the ingredients (and all spatial scales) is displayed in the following {middle-left) panel. The remaining panels {left 
to right, top to bottom) show the images of the single-scale decompositions of the simulated sky (cf. Fig. 2) with the scale sizes Sj annotated at 
the bottom of the panels. For the original images with 2048x2048 2" pixels, the panels present the Fourier amplitudes within the range of spatial 
frequencies from zero to one-fourth of the Nyquist frequency (0.25 arcsec -1 ). For better visibility, the pixel values are somewhat limited in range; 
the color coding is linear. 



background fluctuations are much stronger on larger scales and, 
unlike Gaussian fluctuations, cannot be characterized by a single 
value of the standard deviation. Indeed, they are better described 
by a wide range of standard deviations, as reflected in their 
power spectrum (Fig. E.l). Accordingly, the probability den- 



sity function (PDF) of submillimeter intensity (or column den- 
sity) variations in the Herschel images of Galactic fields is not 
Gaussian, but typically lognormal in non- star-forming clouds, 
such as Polaris (Schneider et al., in prep.; see also Kainulainen 
etal. 2009). 



27 



Men'shchikov et al.: A multi-scale, multi- wavelength source extraction method 





Rosette 9" scale I Rosette 1 8" scale I Rosette 36" scale 



90 179 270 360 450 540 630 721 810 798 1595 2401 3198 4004 4802 5599 6405 7202 2194 4387 6602 8796 11011 13204 15398 17613 19806 

Fig.D.2. Fourier transform of the actual star-forming regions observed by Herschel at 350//m with a 25" resolution. Shown are the amplitudes 
of the decomposed images of Aquila {top left to right) and Rosette {bottom left to right); the scale sizes Sj are annotated at the bottom of the 
panels. For the original Aquila images with 4096x4096 3" pixels and Rosette images with 2048x2048 3" pixels, the panels present the Fourier 
amplitudes within the range of spatial frequencies from zero to one-third of the Nyquist frequency (0.17 arcsec -1 ). For better visibility, the spatial 
frequencies and the pixel values are somewhat limited in range; the color coding is linear. 



In contrast to the astrophysical backgrounds, the instrumen- 
tal noise fluctuations are approximately Gaussian, having a flat 
power spectrum, essentially independent of spatial frequency 
(white noise). This is illustrated in Fig. E.l, which shows the 
estimated power spectrum of the instrumental noise in a typ- 
ical Herschel image at 250/mi, derived from the power spec- 
trum of the PACS 70 /mi image of the Polaris field with no sig- 
nificant cloud emission (cf. Men'shchikov et al. 2010; Miville- 
Deschenes et al. 2010; Andre et al. 2010). For a better compari- 
son with the power-law cirrus profile in Fig. E.l, the spatial fre- 
quencies sampled in the original PACS 70 jim data were scaled 
down by a factor 70/250, to represent the typical range of spa- 
tial frequencies in the SPIRE 250 yum images of the Herschel 
Gould Belt survey. Likewise, the power spectrum of the intensity 
distribution of a Gaussian source with a half-maximum width 
D is itself Gaussian and thus flat approximately up to its half- 
maximum width 2 3/2 In 2 (n D)~ l in spatial frequencies. 

Figure E.l illustrates why decomposing the observed im- 
ages into a finely-spaced set of filtered images is advantageous 
for source extraction. Given the typical power spectra of the 
background fluctuations, instrumental noise, and Gaussian-like 
sources, the maximum contrast of a source with a size D over 
the background fluctuations is obtained at spatial frequencies 
0.46 Z)" 1 or, equivalently, for spatial scales close to 2.2 D. By 



performing source detection in a fine grid of single- scale images, 
getsources can automatically identify and use the scale at which 
the contrast of each source over the background is maximized, 
thereby improving source detectability. 

Another great advantage of the fine spatial decomposition 
employed by getsources (Sect. 2.2) is that the emission fluc- 
tuations in the decomposed single- scale images of interstellar 
clouds follow Gaussian statistics much more closely than the 
cloud fluctuations in the observed images containing all spatial 
scales (cf. Fig. 5). This is because each single-scale image of a 
cirrus cloud can be characterized by a single standard deviation 
value much better than the original image, since it selects only a 
narrow range of spatial frequencies from a power-law spectrum 
of standard deviations, such as that shown in Fig. E.l. 

Appendix F: Estimating shapes of sources 

We derive the source coordinates, sizes, and orientations from 
the moments of the background- subtracted, deblended intensity 
distributions over the pixels of their footprints using the first and 
second moments 

f xl(r) d 2 r 

E(jc) = — r — , (El) 

f I{r)d 2 r 
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Fig. E.l. Typical power spectra of several components contained in ob- 
served Herschel images. Red circles show the power spectrum of the 
SPIRE 250//m image of the Polaris cirrus cloud taken as part of the 
Herschel Gould Belt survey; no attempt was made here to correct the 
raw power spectrum for deviations of the SPIRE 250 yt/m beam from a 
Gaussian shape (cf. Miville-Deschenes et al. 2010). This cirrus back- 
ground is well described by a power law P(q) oc q~ 3 in the range of spa- 
tial frequencies 0.02 <q<2 arcmin -1 . Blue squares show the estimated 
power spectrum of the instrumental noise in a typical Herschel image at 
250 /im, which is flat over more than two orders of magnitude in q. The 
solid curve shows the power spectrum (scaled to an arbitrary level of 
power) of the intensity distribution of a Gaussian source with a size of 1 
arcmin (FWHM). The vertical dashed line marks the spatial frequency 
of 0.46 arcmin -1 , at which the contrast of such a source is maximum 
over the background fluctuations (assumed to follow Pocq~ 3 ). 
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f yl(r)d 2 r 

f I(r)d 2 r ' 
/ x 2 1(r)d 2 r 

f I(r)d 2 r 
f f I(r)d 2 r 

J I(r)d 2 r 
J xyl(r) d 2 r 

f I(r)d 2 r 
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where x, y are the Cartesian coordinates of a vector r = (x, y) and 
the pixel coordinates relative to the source barycenter are given 
by x = x-E(x) and y = y-E(y). Let us write the covariance matrix 



(F.6) 



where a 2 and cr 2 are the variances and cr xy is the covariance of 
the intensity moments. For such a symmetric matrix (cr xy = cr yx ), 
one can find eigenvalues A by solving the characteristic equation 





E(x 2 ) E(xy) 




' °~ 2 X °~ X y ' 




E(xy) E(f) 




_ o-yx cr 2 _ 



det (AT - A/) = 0, 



(F.7) 



where / is the identity matrix. This leads to the quadratic equa- 
tion 



A* = 



0-2 + 0-2 



2\ 2 



J/2 



+ crz 



(E9) 



defining the major and minor axes of an ellipse. The major axis 
forms an angle 6 with the x-axis, given by 



tan (260 



2cr 



xy 



o~ 2 — cr 2 



(F.10) 



A 2 -(a 2 x ^a 2 )A^(cT 2 x cT 2 -cT 2 xy ) = 



(F.8) 



The major and minor FWHM sizes, as well as the position angle 
(east of north) of a source i at wavelength A, are finally computed 
from 

A iA = (A + 81n2) 1/2 , (Ell) 
B iA = (A"81n2) 1/2 , (F.12) 
Q iA = 7r-arctan0. (El 3) 



Appendix G: Installing and using getsources 

The source extraction method described above has been devel- 
oped by A. M. (since July 20, 2008) as a Bash script called get- 
sources and a suite of the FORTRAN utilities executed by the 
script that perform most of the work. This ensures a high degree 
of portability and efficiency, as these two languages are the de 
facto standards in the worlds of the UNIX-like operating sys- 
tems and numerical computations. Either Mac OS X or Linux 
and either ifort 11.1 or gfortran 4.5 can be used to install get- 
sources; other systems and compilers have not been tested. A 
preparation script called prepareobs makes use of SWarp (Bertin 
et al. 2002) for image resampling and reprojection. To read and 
write images in the FITS format, getsources uses the CFITSIO 
library (Pence 1999); to convolve images, the fast Fourier trans- 
form routine rlft3 (Press et al. 1992) is used; the source coordi- 
nates (a, S) are computed by the utility xy2sky from WCSTools 
(Mink 2002). 

The total processing times are proportional to the numbers 
of pixels, spatial scales, wavelengths, iterations, and potential 
sources detected, depending also on the computer CPUs avail- 
able. The extraction time may vary between few minutes and 
a week, the latter being the longest time we have experienced 
and it refers to a 6-wavelengths extraction for 5°. 2x5? 2 images 
with 6234x6234 3" pixels, each image occupying -150 MB of 
disk space. On the other hand, a 6- wavelength extraction for the 
simulated sky used for illustrations in this paper (1800x1800 
2" pixels, each image taking -13 MB of disk space) was com- 
pleted within one day. Although at first glance that may seem a 
long computational time, getsources is not a real-time software; 
the completeness and reliability of the source extraction, not its 
speed, were the priorities in its design. In the kind of astronom- 
ical research we are dealing with, even one week of processing 
is never a limiting factor, if the work is properly planned. A re- 
searcher would spend much more time on the analysis of the 
information delivered by the code (catalogs, images) and on the 
studies of the astrophysical reality of interest (in our case, the 
star formation). 

Processing speed depends on many circumstances in a given 
computing system. Users are advised to always run getsources 
on local (internal) hard drives physically attached to the CPUs 
used for extractions. At some steps, the code performs lots of 
read and write (I/O) operations on FITS files and the users would 
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benefit from the fastest possible I/O throughput 26 . Our algorithm 
is able to use virtual random-access-memory (RAM) disks to 
speed up the I/O for the images and to reduce the processing 
times. However, the gain compared to the speed of a local hard 
drive access may not be very significant. It is generally not opti- 
mal to use network-attached hard drives, as the disk access over 
networks is very slow compared to that of the local disks. If 
one must use the network storage for running getsources and 
has very large amounts of RAM, one might consider using the 
RAM disk facility of the code. This may lead up to a consider- 
able acceleration factor in processing times and thus compensate 
for the inefficient hard disk access over the networks. For multi- 
wavelengths extractions, a natural way of cutting down the com- 
putational times is to run the first two processing blocks (Fig. 1) 
in parallel, one wavelength per CPU, as they are decomposed 
and cleaned independently of each other. If the computational 
time becomes an issue for enormously large images, the users 
may want to split them into several sub-fields to obtain extrac- 
tions much faster, running the extraction in parallel on different 
CPUs. For example, one can get an acceleration factor of ~30 
by splitting images in only 3 sub-fields of equal size (the factor 
assumes the same number of sources in each sub-field and that 
the extractions are run in parallel). 

Another consideration is the storage space necessary to keep 
available intermediate images for many spatial scales and wave- 
lengths until the end of the extraction process; the space needed 
scales between hundreds of MB to tens of GB, depending on 
the image size and the numbers of the scales, wavelengths, and 
measurement iterations. In addition to the resulting catalogs and 
images produced by getsources, a relatively large number of in- 
termediate images and catalogs are also kept on the hard drive. 
Those are useful in case the processing is interrupted for some 
reason or if the user needs to restart the extraction from some 
previous step or to continue the measurement iterations until 
their convergence. Those restart images may also be very useful 
to inspect, in order to better understand the extraction results; 
however, this time-saving feature works, of course, at the ex- 
pense of the disk space. It is up to the user to decide what is the 
biggest issue, the time or the space. Whenever the user becomes 
satisfied with the extraction results or the time needed to re-run 
the extraction is not an issue, those extra files can be removed 
from the hard drive by getsources. 

The code is powerful, automated, flexible, easy-to-use, and 
very extensively tested; the algorithm is designed to be run on 
properly prepared images twice (the initial and final extractions) 
and none of a few parameters of the code need to be changed, as 
their default values have been carefully fine-tuned to work best 
in all cases. The multi- wavelength design of getsources is quite 
flexible and it allows one to use it in some special ways. For ex- 
ample, one can detect sources using only selected wavelengths 
(even a single image) but produce catalogs with measurements 
for all wavebands. It is also possible to add other non-Herschel 
images for either both detection and measurements or to only 
measurements, to use more information and get better results. 
One can also use special mask images to exclude problematic 
(e.g., saturated) areas at some of the wavelengths to avoid us- 
ing those areas in combining images over wavelengths and in 
detecting sources. Users can re-run only selected steps of the ex- 



26 To optimize the processing times, it is a good idea to test available 
disk storage for speed. The script iospeed enables comparisons of avail- 
able hard drives by reading and writing a FITS image multiple times. 
One can perform tests of the local and network disks, as well as of the 
virtual disks. 



traction and also restart the detection and measurements from 
any intermediate scale or iteration. There are also other possibil- 
ities; users are welcome to request information on their specific 
needs from the author. 

The source extraction code with an installation guide and a 
quick start guide are freely available upon request or they can be 
downloaded from our web pages 27 . Users installing getsources 
on their computers are advised to test it on a multi- wavelength 
extraction using Herschel images of the galaxy NGC4559 that 
was chosen as our validation field (the galaxy was observed as 
part of the KINGFISH project, see Kennicutt et al. 2011). This 
relatively quick extraction performed by the author can also be 
requested or downloaded by the users who wish to validate their 
installation and verify that they are able to reproduce the refer- 
ence extraction results. 
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